Friday, September 25, 2015

Druid : Vagrant Up (and Tranquility!)


We've been shown the light.  

After a great meeting with Fangjin and Giam, we are headed down the Tranquility route for real-time ingestion from Kinesis to Druid.  For me, that means getting comfortable with the expanded set of node types in the Druid cluster.

There is no better way to get comfortable with the cluster and the new node types, than to spin up a complete environment and kick the tires a bit.  And there is no better way to do that than "vagrant up"!

So, this morning I set out to create a complete environment suitable for Tranquility, inclusive of middle managers (which are typically omitted from a standalone "simple" cluster).  I started with Quantily's vagrant config for Druid 0.6.160, forked the repo, and went to work.

If you are impatient, you can find my fork here and get going.  If you have the patience...

It's important to understand the anatomy of Druid cluster.  First off, Druid relies on Zookeeper and MySQL.  The install.sh script installs vanilla versions of these, and creates a druid database and user in MySQL.  (Druid itself populates the schema at startup.)

The following is a list of the server types in a Druid cluster.  On our vagrant server, each of the servers occupies a different port, and has its own configuration. (also detailed below)

Overlord: (port 8080)

The overlord is responsible for task management on the Druid cluster.  Since Tranquility is largely an orchestration engine, doling out tasks to create segments as they form in real-time, the overlord is the entry point into the Druid cluster for Tranquility. 


Coordinator: (port 8081)

The coordinator is responsible for segment management, telling nodes to load/drop segments.


Broker: (port 8090)

The Broker is the query proxy in the system.  It receives the query, knows which nodes have which segments, and proxies the query to the respective nodes.


Historical: (port 8082)

Historical nodes are the beasts that load the segments and respond to queries for "static" data.  Once a segment has been committed to deep storage, it is loaded by a historical node, which responds to queries.


MiddleManager: (port 8100)

MiddleManagers are exactly that. =)  They push tasks to Peons, which they spawn on the same node.  Right now, one Peon works on one task, which is produces one segment.  (that may change) 


Special Notes about Real-Time nodes:
Per my previous blog, we were debating the use of a real-time (RT) node.   But with Tranquility, you don't need RT nodes.  They are replaced with transient peon instances that run a temporary firehose ingesting the events for that segment directly from the Tranquility client.  

Druid is targeting highly-available, fully-replicated, transactional real-time ingestion.  Since RT nodes share nothing.  They are unable to coordinate and unable to replicate data, which means they don't fit well into the strategic vision for the project.  Tranquility, and its coordinated ingestion model, may eventually obsolesce/replace RT nodes entirely. (See #1642 for more information)

Getting Started

OK -- down to business.

To fire up your own cluster, simply clone the repository and "vagrant up".  Once things are up and running, you can:

Hit the Overlord at:

Hit the Coordinator at:

In my next blog, I'll detail the client-side of integrating Tranquility using the direct Finagle API.




6 comments:

Arjun Rishi said...

Thanks for sharing your story. It was helpful.

web design institute in chennai

Priya said...

I have read this content it is very nice with unique information and keep updating us.
Digital Marketing Training in Chennai | digital marketing course in Chennai | FITA Velachery

Oleg Zastavnyi said...

Hi Brian, thanks for useful post.

Want to ask you whether u had chance to configure Spark streaming + Druid processing 24/7.
Because we have some problem, when segment period ends Druid stops receive any data.
We are using this example https://github.com/druid-io/tranquility/blob/master/docs/spark.md.
Thanks, Oleg.

Nikshitha S said...

The usage of third party storage system for the data storage can be avoided in cloud computing and we can store, access the data through internet.
cloud computing training in chennai | cloud computing courses in chennai

Nikshitha S said...

Well Said, you have furnished the right information that will be useful to anyone at all time. Thanks for sharing your Ideas.
Hadoop Training Chennai | PHP Training in Chennai

رضا رمضان said...



خدماتنا متميزة عن غيرنا في مجال التسريبات سربات المياه والعوزال وحل بطرق سليمة دون التدمير فعندنا في شركة ركن البيت افضل يوجد افضل الفنين الممتزين في مجال التسربات والكشف عنها بدون اي مشاكل من خلال الطاقم التي تم تدريبه في شركة كشف تسربات المياه بالدمام فتعاملك معنا ستحصل علي خدمات متميزة

شركة كشف تسربات المياه بجدة
شركة كشف تسربات بجدة
شركة عزل خزانات بالرياض
شركة عزل اسطح بالرياض

شركة كشف تسربات بالدمام
شركة كشف تسربات بالرياض
شركة كشف تسربات المياه بالرياض
كشف تسربات المياه