We have a ton of data in relational databases that we are looking to migrate onto our Big Data platform. S We took an initial look around and decided Sqoop might be worth a try. I ran into some trouble getting Sqoop up and running. Here in lies that story...
The main problem is the documentation (and google). It appears as though Sqoop changed install processes between minor dot releases. Google will likely land you on this documentation:
http://sqoop.apache.org/docs/1.99.1/Installation.html
That documentation mentions a shell script, ./bin/addtowar.sh. That shell script no longer exists in sqoop version 1.99.3. Instead you should reference this documentation:
http://sqoop.apache.org/docs/1.99.3/Installation.html
In that documentation, they mention the common.loader property in server/conf/catalina.properties. If you haven't been following the Tomcat scene, that is the new property that allows you to load jar files onto your classpath without dropping them into $TOMCAT/lib, or your war file. (yuck)
To get Sqoop running, you'll need all of the Hadoop jar files (and the transitive dependencies) on the CLASSPATH when Sqoop/Tomcat starts up. And unless, you add all of the Hadoop jar files to this property, you will end up with any or all of the following CNFE/NCDFE exceptions in your log file (found in server/logs/localhost*.log):
java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory
java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobClient
java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
Through trial and error, I found all of the paths needed for the common.loader property. I ended up with the following in my catalina.properties:
common.loader=${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar,${catalina.home}/../lib/*.jar,/Users/bone/tools/hadoop/share/hadoop/common/*.jar,/Users/bone/tools/hadoop/share/hadoop/yarn/lib/*.jar,/Users/bone/tools/hadoop/share/hadoop/mapreduce/*.jar,/Users/bone/tools/hadoop/share/hadoop/tools/lib/*.jar,/Users/bone/tools/hadoop/share/hadoop/common/lib/*.jar
That got me past all of the classpath issues. Note, in my case /Users/bone/tools/hadoop was a complete install of Hadoop 2.4.0.
I also ran into this exception:
Caused by: org.apache.sqoop.common.SqoopException: MAPREDUCE_0002:Failure on submission engine initialization - Invalid Hadoop configuration directory (not a directory or permission issues): /etc/hadoop/conf/
That path has to point to your Hadoop conf directory. You can find this setting in server/conf/sqoop.properties. I updated mine to:
org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/Users/bone/tools/hadoop/etc/hadoop
(Again, /Users/bone/tools/hadoop is the directory of my hadoop installation)
OK --- Now, you should be good to go!
Start the server with:
bin/sqoop.sh server start
Then, the client should work! (as shown below)
bin/sqoop.sh client
...
sqoop:000> set server --host localhost --port 12000 --webapp sqoop
Server is set successfully
sqoop:000> show version --all
client version:
Sqoop 1.99.3 revision 2404393160301df16a94716a3034e31b03e27b0b
Compiled by mengweid on Fri Oct 18 14:15:53 EDT 2013
server version:
Sqoop 1.99.3 revision 2404393160301df16a94716a3034e31b03e27b0b
...
From there, follow this:
http://sqoop.apache.org/docs/1.99.3/Sqoop5MinutesDemo.html
Happy sqooping all.