We have a ton of data in relational databases that we are looking to migrate onto our Big Data platform. S We took an initial look around and decided Sqoop might be worth a try. I ran into some trouble getting Sqoop up and running. Here in lies that story...
The main problem is the documentation (and google). It appears as though Sqoop changed install processes between minor dot releases. Google will likely land you on this documentation:
http://sqoop.apache.org/docs/1.99.1/Installation.html
That documentation mentions a shell script, ./bin/addtowar.sh. That shell script no longer exists in sqoop version 1.99.3. Instead you should reference this documentation:
http://sqoop.apache.org/docs/1.99.3/Installation.html
In that documentation, they mention the common.loader property in server/conf/catalina.properties. If you haven't been following the Tomcat scene, that is the new property that allows you to load jar files onto your classpath without dropping them into $TOMCAT/lib, or your war file. (yuck)
To get Sqoop running, you'll need all of the Hadoop jar files (and the transitive dependencies) on the CLASSPATH when Sqoop/Tomcat starts up. And unless, you add all of the Hadoop jar files to this property, you will end up with any or all of the following CNFE/NCDFE exceptions in your log file (found in server/logs/localhost*.log):
java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory
java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobClient
java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
Through trial and error, I found all of the paths needed for the common.loader property. I ended up with the following in my catalina.properties:
common.loader=${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar,${catalina.home}/../lib/*.jar,/Users/bone/tools/hadoop/share/hadoop/common/*.jar,/Users/bone/tools/hadoop/share/hadoop/yarn/lib/*.jar,/Users/bone/tools/hadoop/share/hadoop/mapreduce/*.jar,/Users/bone/tools/hadoop/share/hadoop/tools/lib/*.jar,/Users/bone/tools/hadoop/share/hadoop/common/lib/*.jar
That got me past all of the classpath issues. Note, in my case /Users/bone/tools/hadoop was a complete install of Hadoop 2.4.0.
I also ran into this exception:
Caused by: org.apache.sqoop.common.SqoopException: MAPREDUCE_0002:Failure on submission engine initialization - Invalid Hadoop configuration directory (not a directory or permission issues): /etc/hadoop/conf/I also ran into this exception:
That path has to point to your Hadoop conf directory. You can find this setting in server/conf/sqoop.properties. I updated mine to:
org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/Users/bone/tools/hadoop/etc/hadoop(Again, /Users/bone/tools/hadoop is the directory of my hadoop installation)
OK --- Now, you should be good to go!
Start the server with:
bin/sqoop.sh server start
Then, the client should work! (as shown below)
bin/sqoop.sh client ... sqoop:000> set server --host localhost --port 12000 --webapp sqoop Server is set successfully sqoop:000> show version --all client version: Sqoop 1.99.3 revision 2404393160301df16a94716a3034e31b03e27b0b Compiled by mengweid on Fri Oct 18 14:15:53 EDT 2013 server version: Sqoop 1.99.3 revision 2404393160301df16a94716a3034e31b03e27b0b ...
http://sqoop.apache.org/docs/1.99.3/Sqoop5MinutesDemo.html
Happy sqooping all.
No comments:
Post a Comment