We haven't had cause to write a Hadoop job against Cassandra since the old days of thrift. (since we introduced Elastic Search in our system) But this week, we found ourselves needing to get some metrics on data stored in the actual C* tables.
I went to the documentation and found this page:
http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configHadoop.html
That page references:
"CQL partition input format: ColumnFamilyInputFormat class"
"CQL partition input format: ColumnFamilyInputFormat class"
I was familiar with the ColumnFamilyInputFormat class from the old thrift days, and I was pretty sure that a new InputFormat was available that used CQL. I headed over to the code, dropped down to the 2.0 branch and found this:
https://github.com/apache/cassandra/blob/cassandra-2.0/examples/hadoop_cql3_word_count/src/WordCount.java
Notice that WordCount.java imports:
import org.apache.cassandra.hadoop.cql3.CqlPagingInputFormat
I went happily along my way and implemented the MapReduce code using this InputFormat, but the compiler kept complaining that CqlPagingInputFormat could not be found. After some investigation, it looks like that class was removed from cassandra-all, sometime between 2.0.3 and 2.0.11. See below:
➜ tusk unzip -l /Users/bone/.m2/repository/org/apache/cassandra/cassandra-all/2.0.11/cassandra-all-2.0.11.jar | grep Cql | grep Input
2882 10-21-14 16:31 org/apache/cassandra/hadoop/cql3/CqlInputFormat.class
➜ tusk unzip -l /Users/bone/.m2/repository/org/apache/cassandra/cassandra-all/2.0.3/cassandra-all-2.0.3.jar | grep Cql | grep Input
1359 11-22-13 08:56 org/apache/cassandra/hadoop/cql3/CqlPagingInputFormat$1.class
2875 11-22-13 08:56 org/apache/cassandra/hadoop/cql3/CqlPagingInputFormat.class
It looks like the crew is already addressing it: https://github.com/apache/cassandra/commit/e550ea60212e933f3849a11717ba4ef916fc4aa3
Hopefully no one else runs into this. ;)
No comments:
Post a Comment