iiDrak Data Platform Setup

A modern Data Lakehouse solution with Open and Unified data processing platform for Data Lake and Data warehouse.

Get Started: iiDrak Data Platform Setup

The article helps you to kickstart your iiDrak Data Platform journey. The steps include

Pre-Requisite

Configure connectors

Connectors helps you to create and store re-usable connection components. This can be shared in lakehouse creation, data pipeline, AI Studio etc.,. Steps to create a connector:

  1. Navigate to Settings -> Connectors Configuration Tab -> Click on [+ Configuration] button
  2. Select Source: This is to select type of connector. Connection properties are automatically displayed based on the source selected. In the current scenario let's use S3 connection as an example.
  3. Enter connector name (Let's name it as lhstorage1)
  4. Enter Access KeySecret KeyRegion and Bucket Name and click on create

A screenshot of a computer

AI-generated content may be incorrect.

Create your Lakehouse

Lakehouse setup is the first step in setting up your environment. With just few clicks you should be able to have a working DB and tables up and running. The steps includes:

  1. Enter a readable name for Lakehouse
  2. Select storage location - Refers to where the data to be stored/accessed from. It can be S3, ABFS or Shared Storage. Select S3 for our example.
  3. S3: This can be configured connector(lhstorage1 to be selected from the drop down)
  4. Providers: Based on the deployment configuration, the executors can be selected from Azure, AWS or GCP.
  5. Executors: This option allows users to create new executors or use already existing clusters. In case of shared clusters, an approval notification would be initiated to the creator of the cluster. If the cluster was created by the same user trying to share, there is no further action to be taken. Once the owner of the cluster approves, the resource can be shared across multiple Lakehouse clusters. It is suggested to share a single cluster max across 5 Lakehouse clusters.
  6. Once dedicated cluster is selected, user will be prompted to select
  7. Create

A screenshot of a computer

AI-generated content may be incorrect.

Create Tables

In this example, we highlight the flexibility of the platform and the ease with which data and executors across different cloud providers and work together. We use S3 bucket in this example as a storage path while using executors(Spark Cluster) from Azure

A screenshot of a computer

AI-generated content may be incorrect.

Once the lakehouse is online, simple navigate to SQL -> SQL Lab and execute a create table query.

create table awscheck.hybrid.demotable (

  name string,

  id int

)

 

Refresh the Catalogs and you can see the table created (Along with the namespace)

Load and query raw data

Raw data such as CSV or JSON stored within the Object storage can also be directly queried within the SQL editor. The local object browser(Upcoming feature)

INSERT OR IGNORE INTO awscheck.aws.catalog_test

SELECT data, id

FROM json."s3a://iceberg-s3test/*.json";