Aller au contenu principal

How to Write Interactive Spark Jobs in Python (IlumJob)

This guide teaches you how to develop interactive Spark jobs in Python using the IlumJob interface. You'll learn how to structure your code, pass parameters at execution time, and leverage the benefits of this approach for production workloads on Kubernetes.

What is the IlumJob Interface?

Le IlumJob interface is a Python base class used to create reusable, parameterized Spark jobs that run on interactive Ilum services. Unlike traditional étincelle-soumission scripts, IlumJob allows you to:

  • Receive configuration at runtime: Parameters are passed as a dictionary, allowing the same job to handle different inputs without code changes.
  • Return structured results:Le Courir method returns a string, making it easy to extract and display results.
  • Run on-demand: Jobs can be triggered via the UI, REST API, or CI/CD pipelines.
Basic Structure
De ilum . API importation IlumJob 

classe MySparkJob( IlumJob ) :
Def Courir ( même , étincelle , Configuration ) - > str:
# Your Spark logic here
rendre "Job completed successfully"

Structure of an Interactive Spark Job

Every interactive job consists of three essential parts:

  1. Import the interface: from ilum.api import IlumJob
  2. Define a class: Create a class that inherits from IlumJob .
  3. Implement Courir : Write your Spark logic inside the run(self, spark, config)méthode.
ParameterType Description
étincelle SparkSession Pre-initialized Spark session, ready to use.
Configuration dictA dictionary containing parameters passed at execution time.
ReturnstrA string result that will be displayed in the UI or returned via API.

How to Pass Parameters to Spark Jobs

Parameters are passed as a JSON object when executing the job. Inside your Courir method, you access them using standard dictionary methods.

Example: Table Inspector

This example demonstrates reading databaseet table parameters to inspect a Hive table.

table_inspector.py
De ilum . API importation IlumJob 
De Pyspark . SQL . functions importation col, sum comme spark_sum

classe TableInspector( IlumJob ) :
Def Courir ( même , étincelle , Configuration ) - > str:
# Read required parameters
table_name = Configuration . Avoir ( 'Table' )
database_name = Configuration . Avoir ( 'database') # Optional

si non table_name :
raise ValueError( "Config must provide a 'table' key")

# Set database if provided
si database_name:
étincelle . catalogue . setCurrentDatabase( database_name)

# Check if table exists
si table_name non dans [ t . nom pour t dans étincelle . catalogue . listTables( ) ] :
raise ValueError( f"Table '{ table_name } ' not found in catalog")

Df = étincelle . table ( table_name )

# Build report
report = [
f"=== Table: { table_name } ===",
f"Total rows: { Df . compter ( ) } " ,
f"Total columns: { len( Df . columns) } " ,
"" ,
"Schema:",
]
pour field dans Df . schéma . fields:
report. append( f" { field. nom } : { field. dataType} " )

report. append( "" )
report. append( "Sample (5 rows):")
pour ramer dans Df . take( 5 ) :
report. append( str( ramer . asDict( ) ) )

# Null counts
report. append( "" )
report. append( "Null counts:")
null_df = Df . select( [ spark_sum( col( c ) . isNull( ) . cast( "int") ) . alias( c ) pour c dans Df . columns] )
pour c , v dans null_df. collect( ) [ 0 ] . asDict( ) . items( ) :
report. append( f" { c } : { v} " )

rendre "\n". join( report)

Execution Parameters (JSON)

When executing via UI or API, provide parameters like this:

{ 
"database": "ilum_example_product_sales",
"table": "products"
}

Before You Start

To run an interactive job, you first need to create and deploy a Job-type Service in Ilum. This service provides the Spark environment where your jobs execute.

When creating the service:

  • Type : Select Travail
  • Langue : Select Python
  • Py Files: Upload your job file (e.g., table_inspector.py)

👉 Learn how to deploy a Job Service — step-by-step guide with UI screenshots and configuration options.

Executing Jobs

You can execute interactive jobs in three ways:

  1. Atteindre Services → Select your Job service
  2. Dans le Exécuter section:
    • Classe: table_inspector.TableInspector
    • Parameters: {"database": "sales", "table": "orders"}
  3. Cliquer Exécuter

The result string is displayed immediately in the UI.


Benefits of the IlumJob Approach

BenefitDescription
ReusabilityWrite once, run many times with different parameters.
No Cold StartsInteractive services keep Spark warm, so subsequent executions are instant.
ParameterizationPass configuration at runtime—no need to hardcode values.
Observabilité Results are captured and visible in the UI/API for easy debugging.
API-DrivenExecute jobs programmatically from orchestrators, CI/CD, or external systems.
Version ControlStore job code in Git and deploy via pipelines.

Interactive Jobs vs. Batch Jobs (Spark Submit)

Caractéristique Interactive Jobs (IlumJob ) Batch Jobs (étincelle-soumission )
Startup TimeInstant (uses warm executors)Slow (provisions new pods)
ContextShared Spark ContextIsolated Spark Context
Use CaseAd-hoc queries, API backends, quick reportsLong-running ETL, heavy processing
Résultat Returns string result to API/UILogs to driver stdout/file
Ressources Shared within the serviceDedicated per job

Bonnes pratiques

1. Validate Input Parameters

Always validate required parameters and provide helpful error messages.

Validate Parameters
Def  Courir ( même , étincelle , Configuration )  - >  str: 
required_keys = [ 'Table' , 'output_path']
pour clé dans required_keys:
si clé non dans Configuration :
raise ValueError( f"Missing required parameter: '{ clé } '")

2. Use Default Values

For optional parameters, use config.get('key', default_value).

Use Default Values
batch_size =  Int ( Configuration . Avoir ( 'batch_size',  1000 ) ) 

3. Structure Your Output

Return a well-formatted string for readability in the UI.

Structure Output
lines =  [ "=== Job Summary ==="] 
lines. append( f"Processed: { compter } records")
lines. append( f"Duration: { elapsed_time} s")
rendre "\n". join( lines)

4. Handle Errors Gracefully

Wrap risky operations in try/except and return meaningful messages.

Handle Errors
try: 
Df . écrire . saveAsTable( output_table)
rendre f"Successfully wrote to { output_table} "
except Exception comme e :
rendre f"Error writing table: { str( e ) } "

Complete Example: Transaction Report Generator

This job generates a transaction summary report based on the transaction_anomaly_d.transactionstable.

transaction_report.py
De ilum . API importation IlumJob 
De Pyspark . SQL . functions importation sum comme spark_sum, compter , col

classe TransactionReportGenerator( IlumJob ) :
Def Courir ( même , étincelle , Configuration ) - > str:
# Parameters
merchant_filter = Configuration . Avoir ( 'merchant') # Optional filter

# Load data from the default Ilum transactions table
Df = étincelle . table ( "transaction_anomaly_detection.transactions")

si merchant_filter:
Df = Df . filtre ( col( "Merchant") == merchant_filter)

# Aggregate by TransactionType
summary = Df . groupBy( "TransactionType") . agg(
compter ( "TransactionID") . alias( "transaction_count") ,
spark_sum( "Amount") . alias( "total_amount")
) . collect( )

# Build report
report = [
f"=== Transaction Report ===",
f"Merchant Filter: { merchant_filter ou 'All'} " ,
"" ,
"Summary by Transaction Type:",
]
pour ramer dans summary:
report. append( f" { ramer [ 'TransactionType'] } : { ramer [ 'transaction_count'] } txns, ${ ramer [ 'total_amount'] : ,.2f} " )

rendre "\n". join( report)

Execute with:

Execute with Payload
{ 
"merchant": "AcmeCorp"
}

Next Steps


Foire aux questions

Can I use Scala for interactive jobs?

Yes. Currently, the IlumJob interface is primarily documented for Python . Check the Interactive Job Service documentation for language support details.

How do I debug an interactive job?

Since interactive jobs run on a remote cluster, you can't use a local debugger directly. Instead:

  1. Utiliser print() statements or a logger, which will appear in the driver logs.
  2. Return error messages as part of the string result in your try/except blocks.
  3. Check the Interface utilisateur Spark for the specific job execution to analyze tasks and stages.
What happens if my job fails?

If your code raises an unhandled exception, the execution will fail, and the error trace will be returned in the API response. It is best practice to wrap your logic in a try/except block to return a user-friendly error message.