Kinesis pricing is based on two dimensions: hourly price for each shard (which dictates overall concurrency/throughput), and a unit price per 1M messages. As I alluded to in my previous post, we process a *ton* of events, and if we simply pushed those events onto Kinesis streams 1:1, we would quickly hand over all of our money to Amazon and go out of business.
However, with the recent release of the Kinesis Producer Library (KPL), Amazon exposed a native capability to aggregate multiple messages/events into a single PUT unit. The maximum size of a PUT unit is 25Kb. If you have a message size of 150 bytes, you can cram about 150 messages into a single PUT unit! (and save some serious dough!)
Here is the math, straight from the Kinesis Pricing Calculator:
Without the KPL,
100K m / s * (150 bytes / m) = 100 shards and 263,520M PUT units = $4,787.28 / month
* Note: That each shard can only process 1K/s, which is why we end up with 100 shards.
We would reduce our required throughput (messages / second) down to 1K / s.
With 100 messages in each PUT unit, each unit would be 15Kb in size. (100 * 150 bytes)
This gives us:
1K m / s * (15 Kb / m) = 15 shards and 2,635.2M PUT units = $201.60 / month
That is s savings of: ~20x!!
So, let's look at what that means architecturally...
First, you are going to want to look at the KPL recommended usage matrix:
My naive interpretation of that chart is: Use Java!
Ironically, the producer is actually a native built micro-service/binary. And the KPL is a java wrapper around that native binary that uses interprocess communication to delegate work to the micro-service.
In Java, it is straightforward to integrate the KPL. Here are the key components.
First, you need to configure the producer:
Notice, that the KPL actually sends the messages asynchronously, buffering/aggregating those messages internally. (which makes for challenging guarantees around message delivery -- but we'll defer that problem for another day)
Also notice, that I built my own native binary, and specified its location. You don't necessarily need to do that, the binary is actually bundled into the JAR file and extracted into a temporary location at runtime. (yeah, crazy eh?)
Next, you need to send some data:
This is done simply, one line of code:
Finally, you need to find out what happened to your message:
For this one, we'll add a callback to the future:
Where, callback is a method that will be called when the record is processed. Here is an example:
And there you have it, KPL in a nutshell.
For us, it was a huge win, and actually made Kinesis a viable alternative.
Now, we just need to figure out how to plumb it into everything else. =)
Kinesis Firehose to the rescue! (stay tuned)