Compiling Hadoop codes

After wandering for around a month and a half and pulling my hair off my head, it feels good when the work starts heading in a definite direction. Understanding how hadoop works and then start coding for it are miles apart.

One of the problems that I faced in coding was I couldn’t compile any of my hadoop codes that I wrote, not even the one that were given in the books. The error that came up looked something like-

xyz.java:5: package org.apache.hadoop.fs does not exist

import org.apache.hadoop.fs.Path;

^

xyz.java:6: package org.apache.hadoop.io does not exist

import org.apache.hadoop.io.*;

^

and so on..

The basic problem is the classpath. We need to set the classpath to compile our codes because hadoop library files are yet to be integrated so that they can be referred during compilation. This can be done by-

$ javac -classpath hadoop-common-0.21.0.jar <filenam.java>

you can add -verbose option to the command-line so that you can actually see what’s going on during the compilation.

Though I did this on Linux, but it doesn’t really matter on the OS. The same syntax applies even to Windows.

With this you are done with compilation of your hadoop code. Jar your files and then execute them.

Advertisements

Published by

Harsh

Developer at Microsoft by the day, a wannabe physicist by the night.

33 thoughts on “Compiling Hadoop codes”

  1. Undoubtedly think which which you stated. Your preferred reason appeared to be on the web the easiest thing to be mindful of. I say to you, I certainly get irked even though people ponder worries that they simply do not realize about. You monitored to hit the toenail upon the top as well as outlined out the complete matter devoid of possessing side-effects , people may take a signal. Can likely be back to get much more.

  2. hey i m getting the same kind of errors..
    How to handle these ? I have tried all sort of things. I also used -verbose option, but still not able to figure out

    1. well, the syntax is
      “javac -classpath /hadoop-common-0.21.0.jar ”
      Did you do this?
      If it still gives an error, then paste your error along with the commands executed to compile your code.

  3. here is the command i used :

    javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar:${HADOOP_HOME}/lib/hadoop-commons-cli-1.2.jar:${HADOOP_HOME}/hadoop-core*.jar -d wordcount_classes WordCount.java WordCountMapper.java WordCountReducer.java

    1. are you using the example codes?
      you can straightaway execute the example codes with syntax like
      “hadoop jar hadoop-mapred-examples-{$version}.jar wordcount input_file.txt output_file”
      to be very honest, even I am not quite sure which jar files are necessary to compile the sample code, if it is the example code that you are trying to execute

      1. No, i am not running example codes. I have written functions. By the way i tried the command you gave in your previous comment. Its giving run time error. Could you give the correct syntax ?

      2. well wordpress didn’t parse the angle brackets which I used in previous comment and so “input_file” and “output_file” parameters din’t appear.
        try the updated syntax

  4. commands used:

    mkdir wordcount_classes

    javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar:${HADOOP_HOME}/lib/hadoop-commons-cli-1.2.jar:${HADOOP_HOME}/hadoop-core*.jar -d wordcount_classes WordCount.java WordCountMapper.java WordCountReducer.java

    I have tried various variations of the above command….

  5. Harsh :
    well wordpress didn’t parse the angle brackets which I used in previous comment and so “input_file” and “output_file” parameters din’t appear.
    try the updated syntax

    Harsh :
    well wordpress didn’t parse the angle brackets which I used in previous comment and so “input_file” and “output_file” parameters din’t appear.
    try the updated syntax

    Here is the command i ran to run the example. But its giving the error.

    [cloudera@localhost ~]$ hadoop jar hadoop-examples.jar org.myorg.WordCount /userdata/ankit/input/Input /home/cloudera/use-case/Output
    Exception in thread “main” java.io.IOException: Error opening job jar: hadoop-examples.jar
    at org.apache.hadoop.util.RunJar.main(RunJar.java:124)
    Caused by: java.util.zip.ZipException: error in opening zip file
    at java.util.zip.ZipFile.open(Native Method)
    at java.util.zip.ZipFile.(ZipFile.java:114)
    at java.util.jar.JarFile.(JarFile.java:135)
    at java.util.jar.JarFile.(JarFile.java:72)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:122)

      1. /userdata/ankit/input/Input is the input directory which contains 3 input files.

        even if i change it to /userdata/ankit/input/Input/*.txt or /userdata/ankit/input/Input/pg4300.txt its giving the same error

        pg4300.txt is a file in Input directory

    1. the path to access your file on the dfs is hdfs url. it would be something like “hdfs://localhost/user/ankit/pg4300.txt” or something like that, had you created a user’s directory and stored your file inside that directory. or it may be like “hdfs://localhost/pg4300.txt”.
      check the location of your files using “hadoop fs -lsr /” and then use the url accordingly. this command will enlist all your directories and files from root directory recursively.

      1. I checked in hdfs file system. All the input files are in hdfs file system but the jar file is in the local file system. And i think jar file ought to be in the local file system only. Correct me if I am wrong…..
        My problem is still not solved…

      2. sorry man, then! that’s the best i can do for you (atleast from my system. though i guess i won’t be of a greater help, even if i were on your system).

  6. Ankit Sambyal :
    I checked in hdfs file system. All the input files are in hdfs file system but the jar file is in the local file system. And i think jar file ought to be in the local file system only. Correct me if I am wrong…..
    My problem is still not solved…

    I am stuck with the same problem too….
    I am also encountering those 48 errors because the compiler is not able to find the packages

      1. yes….the command that I used to run was:
        javac -classpath /.freespace/user/anireddy/hadoop/hadoop-core-0.20.203.0.jar:/.freespace/user/anireddy/hadoop/lib/commons-cli-1.2.jar -d wordcount_classes WordCount.java

  7. hi ,

    i am trying to compile hadoop release 1.0.0 ,
    the command i am giving to compile is :

    sudo ant -Djava5.home=/home/tools/java5/jdk1.5.0_22 -Dforrest.home=/home/tools/forrest/apache-forrest-0.9 -Dfindbugs.home=/home/tools/findbugs/findbugs-2.0.0/latest compile-core tar.

    The error we are getting is :

    BUILD FAILED
    java.lang.ClassNotFoundException: org.apache.tools.ant.taskdefs.optional.TraXLiaison
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:169)
    at org.apache.tools.ant.taskdefs.XSLTProcess.loadClass(XSLTProcess.java:548)
    at org.apache.tools.ant.taskdefs.XSLTProcess.resolveProcessor(XSLTProcess.java:533)
    at org.apache.tools.ant.taskdefs.XSLTProcess.getLiaison(XSLTProcess.java:785)
    at org.apache.tools.ant.taskdefs.XSLTProcess.execute(XSLTProcess.java:300)
    at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288)
    at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
    at org.apache.tools.ant.Task.perform(Task.java:348)
    at org.apache.tools.ant.Target.execute(Target.java:357)
    at org.apache.tools.ant.Target.performTasks(Target.java:385)
    at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1337)
    at org.apache.tools.ant.Project.executeTarget(Project.java:1306)
    at org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
    at org.apache.tools.ant.Project.executeTargets(Project.java:1189)
    at org.apache.tools.ant.Main.runBuild(Main.java:758)
    at org.apache.tools.ant.Main.startAnt(Main.java:217)
    at org.apache.tools.ant.launch.Launcher.run(Launcher.java:257)
    at org.apache.tools.ant.launch.Launcher.main(Launcher.java:104)

    Could you please help us out .

    Thank you

  8. hi, im trying to compile the wordcount program with little modifications to the given example wordcount program.When i run this command the below error is obtained.what does that mean?
    javac -classpath /home/user/hadoop/hadoop-core-1.1.2.jar:/home/user/hadoop/commons-cli-1.2.jar -d wordcount_classes WordCount.java
    WordCount.java:53: error: cannot access Options
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    ^
    class file for org.apache.commons.cli.Options not found
    1 error

    1. If you’re using the sample WordCount example, you’ll have to give command line parameters along with this. You are supposed to provide the input file and the output directory for the code to run.

  9. I modified “common”, and then tried to compile “HDFS”. However, it seems that HDFS can not be aware those modifications in “common”. Can you tell me how to compile “HDFS” with a modified “common”? Thank you!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s