Aller au contenu principal

Configure Cloud Object Storage (GCS, S3, Azure) for Data Lake

Ilum allows you to link CGV , S3 , WASBS et HDFS storages to your clusters. Linking storage allows Ilum to automatically configure all your jobs to use your cloud data lakes seamlessly, eliminating the need for manual Spark parameter configuration.

Supported Storage Providers

ProviderType Description
Stockage Google Cloud CGV Native integration for GCP projects.
Amazon S3 S3 Standard S3 and S3-compatible storage support.
Stockage Blob Azure WASBS/ABFSIntegration for Azure data lakes.
HDFS HDFS Connect to existing Hadoop Distributed File Systems.

Stockage Google Cloud (GCS)

Step 1: Create a GCS Bucket

Manif:

Guide en plein écran

  1. Create a Google Cloud Project

    • Open Google Cloud Consoleet allez à Sélecteur de projet / Manage Resources.
    • Cliquer New Project/ Create Project.
    • Enter a Project name, choose Organizationet Emplacement .
  2. Create a GCS Bucket

    • In the Console, navigate to Stockage dans le cloud Buckets.
    • Cliquer Créer .
    • Enter a globally unique Bucket name (e.g., my-ilum-bucket) and select your Region.
    note

    Remember the bucket name you created - you will need it when adding this storage to Ilum.

  3. Create a Service Account and JSON Key

    • Atteindre IAM & AdminService Accounts.
    • Cliquer Create Service Account, fill in details, and grant Storage Admin roles.
    • Click the created email, go to the Keys tab, and Create new key (JSON).
    • Save the downloaded JSON file securely.
    important

    Organization Policy Update: In new organizations, creating service account keys might be disabled by default. Contact your administrator if you cannot create keys.

Step 2: Add GCS to Ilum Cluster

Manif:

Guide en plein écran

  1. Navigate to Charges Clusters Éditer Stockage Add Storage.

  2. Configure General Settings:

ParameterValue ExampleDescription
Nom my-gcs-storageUnique name for this storage config.
Type CGV Select GCS provider.
Godet d’étincelles my-ilum-bucketBucket for Spark logs/events.
Compartiment de données my-ilum-bucketBucket for your data.
  1. Configure GCS Authorization: Open your JSON key file and copy the values:
ParameterSource KeyDescription
Client Emailclient_email Service account email address.
Private Keyprivate_key Full key including -----BEGIN....
Private Key IDprivate_key_id Key ID string.
  1. Cliquer Envoyer to save.

Step 3: Verify Connection

To ensure your storage is correctly configured, run a simple Spark job.

  1. Create a Code Service:

    • Atteindre Charges Services New Service +.
    • Select Type : Code , Langue : Scala , and your Grappe .
  2. Execute Test Code: Paste and run the following Scala code:

    Test Storage Connection
    // Write test data
    valdonnées = Seq( ( "Alice", 34) , ( "Bob", 45) )
    valDf = étincelle . createDataFrame ( données ) . toDF( « nom » , "age")

    // Replace with your bucket path (e.g., gs://..., s3a://..., wasbs://...)
    valchemin = "gs://my-ilum-bucket/output/"

    Df . écrire . mode ( « écraser » ) . format( "csv") . save( chemin )

    // Read back data
    étincelle . lire . format( "csv") . load( chemin ) . montrer ( )
  3. Check Results: If the job completes and displays the data table, your storage connection is active.


Common Issues & FAQ

Why do I get a "Permission Denied" error?

Cause: The Service Account or User doesn't have permissions to access the bucket. Solution:

  1. Go to your cloud provider's console (e.g., Google Cloud Console).
  2. Navigate to the bucket's Autorisations onglet.
  3. Grant your service account the Storage Adminou Storage Object Admin role.

Why does it say "Bucket does not exist"?

Cause: The bucket name in your code doesn't match the actual bucket name, or the region is incorrect. Solution:

  1. Verify the bucket exists in your cloud console.
  2. Check that the bucket name in your code matches exactly (names are often case-sensitive).

Why do I get "Invalid credentials"?

Cause: The keys (JSON or Access Keys) were not copied correctly. Solution:

  1. Re-open your key file.
  2. Carefully copy the values again. For GCS, ensure you include the -----DÉBUT DE LA CLÉ PRIVÉE----- et -----FIN DE LA CLÉ PRIVÉE----- lines.
  3. Re-save the storage configuration in Ilum.