, , , ,


Oozie is workflow engine for Hadoop. If you never worked with oozie you should give it a shot. We do a lot of Hadoop stuff while most of the frontend is still .NET which runs using WCF services so this post describes a simple .NET API built up on Oozie REST API to start and stop Oozie workflows.

Oozie has a REST API. We connect to the REST API using .NET HttpWebRequest. You can download the project from Github.

var connection = new OozieConnection("hadoop1", 11000);

We have to post the XML config to the Oozie REST Endpoint.

<?xml version=\"1.0\" encoding=\"UTF-8\"?>
//Map Reduce Queue name
 <property> <name>mapred.job.queue.name</name> <value>default</value> </property>

//User name to run the job under
 <property> <name>user.name</name> <value>root</value> </property>  

//Mark this true if you are using map reduce or pig jobs
 <property> <name>oozie.use.system.libpath</name> <value>true</value> </property>

//properties for the job. The job creates the table {tableName} in the {databaseName} database
 <property> <name>tableName</name> <value>{0}</value> </property>
 <property> <name>databaseName</name> <value>{1}</value> </property>

//Location where the job.properties and the workflow exists. Location should be HDFS.
 <property> <name>oozie.wf.application.path</name>
 <value>hdfs://hadoop1.allegiance.local:8020/user/root/hiveoozie</value> </property>

Query String action=start will start the workflow immediately

var result = connection.Post("oozie/v1/jobs?action=start", String.Format(xmlData, tableName, databaseName));

Deserialize the response to get the job id.

var serializer = new JsonNetSerializer();
var id = serializer.Deserialize(result).id;

We get job status and keep polling the API until the job is completed. Depending on the job adjust your sleep time.

var statusinfo = connection.Get("oozie/v1/job/" + id + "?show=info");
var status = serializer.Deserialize(statusinfo).Status;

while (status != "SUCCEEDED")
 //TODO : Record the status in a shared dictonary for web UI to poll the status
  statusinfo = connection.Get("oozie/v1/job/" + id + "?show=info");
  status = serializer.Deserialize(statusinfo).Status;

If you have questions or struck you can reach me at @abhishek376 on Twitter.