Thursday, October 8, 2015

Building a Spark runnable application package using SBT and IBM BigInsighst 4



As a continuation to my older blog post about building a standalone job for spark and running it (see link below)
Running spark application in a standalone mode

I have decided to create another blog post about how would you build and run your spark application on a commercial hadoop distribution which is yarn enabled, as most of us will probably not configure hadoop from scratch and will use some kind of a commercial distribution .  I have used IBM BigInsights 4.0 quick start edition (now called IBM IOP for hadoop) for this purpose.

The contents of this post will describe :

Step 1: install sbt on the target machine (ubuntu linux)
Step 2: code the simple program (yarn-client compatible)
Step 3: copy the file into the SBT enabled system
Step 4: create and edit the simpleCluster.sbt
Step 5: create the mkDirStructure.sh to automate the directory creation
Step 6: run the mkDirStructure.sh
Step 7: package the spark application
Step 8: create the input on BigInsights system
Step 9: move the jar to the BigInsights driver machine
Step 10: run the spark application on the BigInsights machine

You can download the full document from here:
Building a Spark runnable application package using SBT and IBM BigInsighst 4.pdf