To setup a Databricks Account, firstly we need to Sign Up using below link

https://databricks.com/try-databricks

Post Sign Up, complete the registration process and login to Databricks using

https://community.cloud.databricks.com/

Databricks provides us with a Unified Analytics Platform, that is available on both AWS and Azure Cloud Platforms. By default Databricks account setup is done on AWS, however it's also available on Azure as a managed service called Azure Databricks.


Post login to Databricks, let's create a Cluster from UI as

Create_Cluster


Databricks Clusters can also be created programmatically via Databricks CLI, RestAPI etc using following JSON

{
    "num_workers": 0,
    "cluster_name": "dataengineeringe2e",
    "spark_version": "7.0.x-scala2.12",
    "spark_conf": {},
    "aws_attributes": {
        "first_on_demand": 0,
        "availability": "ON_DEMAND",
        "zone_id": "us-west-2c",
        "spot_bid_price_percent": 100,
        "ebs_volume_count": 0
    },
    "node_type_id": "dev-tier-node",
    "ssh_public_keys": [],
    "custom_tags": {},
    "spark_env_vars": {},
    "autotermination_minutes": 120,
    "init_scripts": []
}

We can see in the JSON, Databricks is using AWS underneath to spin up the Community Edition Free Cluster. It will be a single node cluster with Free 15GB Memory for usage.

It's a Community Edition Databricks Account therefore we don't have functionality to use Databricks CLI or Rest API as we cannot generate Access Tokens


Once Databricks Cluster is up, we can see the running clusters as

Cluster_State


Now, finally we can create a notebook, attach it to our cluster and start exploring data

Notebook


Start exploring the world of Databricks