Friday, January 23, 2015

Hadoop for Cassandra: CqlInputFormat != CqlPagingInputFormat != ColumnFamilyInputFormat


We haven't had cause to write a Hadoop job against Cassandra since the old days of thrift.  (since we introduced Elastic Search in our system)   But this week, we found ourselves needing to get some metrics on data stored in the actual C* tables.

I went to the documentation and found this page:
http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configHadoop.html

That page references:
"CQL partition input format: ColumnFamilyInputFormat class"

I was familiar with the ColumnFamilyInputFormat class from the old thrift days, and I was pretty sure that a new InputFormat was available that used CQL.  I headed over to the code, dropped down to the 2.0 branch and found this:
https://github.com/apache/cassandra/blob/cassandra-2.0/examples/hadoop_cql3_word_count/src/WordCount.java

Notice that WordCount.java imports:
import org.apache.cassandra.hadoop.cql3.CqlPagingInputFormat

I went happily along my way and implemented the MapReduce code using this InputFormat, but the compiler kept complaining that CqlPagingInputFormat could not be found. After some investigation, it looks like that class was removed from cassandra-all, sometime between 2.0.3 and 2.0.11. See below:

➜  tusk  unzip -l /Users/bone/.m2/repository/org/apache/cassandra/cassandra-all/2.0.11/cassandra-all-2.0.11.jar | grep Cql | grep Input
     2882  10-21-14 16:31   org/apache/cassandra/hadoop/cql3/CqlInputFormat.class
➜  tusk  unzip -l /Users/bone/.m2/repository/org/apache/cassandra/cassandra-all/2.0.3/cassandra-all-2.0.3.jar | grep Cql | grep Input
     1359  11-22-13 08:56   org/apache/cassandra/hadoop/cql3/CqlPagingInputFormat$1.class
     2875  11-22-13 08:56   org/apache/cassandra/hadoop/cql3/CqlPagingInputFormat.class

It looks like the crew is already addressing it: https://github.com/apache/cassandra/commit/e550ea60212e933f3849a11717ba4ef916fc4aa3

Hopefully no one else runs into this. ;)

17 comments:

Sam BESSALAH said...

Actually, I ran into this last year, while using Spark do do just like you metrics aggregations. By the time I rolled back using CQLInputFormat (which wasn't handy for me) they hopefully open sourced the Cassandra spark connector.

vasudha dharani said...

Hadoop Developer --- "
Big Data (Hadoop) Developer Online Training
Send ur Enquiry to contact@21cssindia.com
Understanding Big Data
Introduction/Installation - Hadoop Custom VM(Single Node)
Understanding Big Data
3V (Volume-Variety-Velocity) Characteristics
Structured and Unstructured Data
Application and use cases of Big Data" more… Online Training- Corporate Training- IT Support U Can Reach Us On +917386622889 - +919000444287 http://www.21cssindia.com/courses/hadoop-online-training-182.html

Skill Quotient said...

It was so nice article and useful to Informatica learners. we also provide Informatica Course online training
Microsoft Dynamics GP Training | Informatica training

rehan singh said...

Great article! Cassandra online training includes Advantages and usage of Cassandra, CAP Theorem and Nosql databases, Cassandra fundamentals, Data model, Installation and setup, node tool commands, cluster, Indexes, Cassandra & Mapreduce, Installing Ops-center, Thrift/AVRO/JSON/Hector Client. More at https://intellipaat.com/nosql-cassandra-hbase-training/

TEK CLASSES said...

Thanks for your support, i am very interested in learning Hadoop.. If you want more details on HADOOP BIGDATA
just go through this link.....http://www.tekclasses.com/courses/hadoop/

peterjohn said...

I appreciate you sharing this article. Really thank you! Much obliged.
This is one awesome blog article. Much thanks again.


sap online training
software online training
sap sd online training
hadoop online training
sap-crm-online-training

peterjohn said...

I really enjoy the blog.Much thanks again. Really Great.
Very informative article post. Really looking forward to read more. Will read on…


oracle online training
sap fico online training
dotnet online training
qa-qtp-software-testing-training-tutorial

Steve Hawks said...

There are lots of information about latest technology and how to get trained in them, like Big Data Hadoop Training in Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Hadoop Course in Chennai). By the way you are running a great blog. Thanks for sharing this.

Best Hadoop Training in Chennai
| Best hadoop training institute in chennai

Jannik Andrew said...

The Hadoop tutorial you have explained is most useful for begineers who are taking Hadoop Administrator Online Training
Thank you for sharing Such a good tutorials on Hadoop

Akula Rahul said...

Latest Government Jobs 2016

Thanks for providing valuable information in this site........

rajashekhar reddy said...

Latest Govt Jobs Notification 2016


The information provided was extremely useful and informative. Thanks a lot for useful stuff.................

Anna said...

Great and Useful Article.

Online Java Training

Java Online Training India

Java Online Course

Java EE course

Java EE training

Best Recommended books for Spring framework

Java Interview Questions








Java Course in Chennai

Java Online Training India

Rasool Bevi said...

Very useful and informative blog.

Hadoop training in chennai

jhansi joe said...

There are lots of information about hadoop have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get to the next level in big data. Thanks for sharing this.Hadoop Training in Chennai | Big Data Training in Chennai

Divit said...



I wondered keep share this sites .if anyone wants realtime training Greens technolog chennai in visit this blog.


Cassandra Training in Chennai

Jhon Abraham said...

Your article gives more information.It helps to get a great career in IT industry.
Regards
Hadoop courses in

chennai
|

Hadoop Training in chennai

A1trainings said...

hadoop online training by real-time experts visit A1trainings

Hadoop training in india