Thursday, September 17, 2015

Building a spark runnable standalone application package using sbt and ubuntu linux


As everyone in the world of computing knows, Apache spark is one of the most interesting and talked about projects in today's open source community.although Apache spark is so much talked about, it is still a long way for being a "user friendly" application, and specifically one of the areas which it is a bit lacking on is the "build" area.

When building your own standalone java application you would typically use something like Apache ant or use built in tools within your IDE in order to generate the required jar file or any other construct which is required. both of these subjects are covered in detail in lots of books and documents.
Apache spark is typically used with a tool called "sbt" , which is a shortname for "simple build tool".

This build tool is mainly used within the Scala ecosystem, and on some cases can become quite "not simple".  more about the tool here : http://www.scala-sbt.org/

When I started working with Apache Spark I had some major issues with "sbt" and it's integration with Spark, so in order to help others not to go through the same problems I had, I have decided to post this "hands-on" tutorial .


This tutorial covers the following steps :
  • Step 1: install SBT on the target machine
  • Step 2: code the simple program
  • Step 3: copy the file into the SBT enabled system 
  • Step 4: create the input text file at /home/spark/input.txt
  • Step 5: create and edit the simple.sbt file 
  • Step 6: create the mkDirStructure.sh to automate the directory creation
  • Step 7: run the mkDirStructure.sh
  • Step 8: package the spark application
  • Step 9: run the spark application



No comments:

Post a Comment