Can Kafka be used as a database?

I've heard Kafka is good for real time event streaming..but can it store data as well?



A lot of people think the answer is NO because they are stupid. Kafka is an amazing DB solution if you ask me. You can reread messages as many times as you like because Kafka retains them.


I've seen multiple use cases where Kafka is used as a change data capture (CDC) mechanism...traditionally this is handled by DB itself so I don't see the issue with using Kafka as a database....

Of course this will take some configuration as I'm pretty sure Kafka default retention policy isn't infinity :)


While it's possible I'd consider this a rather unconventional way of going about things. Kafka really shines as a real time messaging platform. Using Kafka streams you can transform millions of messages per second efficiently and without error.

Rather than use Kafka as your database, I would recommend using Kafka to get data to other storage engines like Mongo or HDFS (Hadoop)...


The answer is a resounding YES. Kafka can definitely be used as a database because of how it retains messages. Since different consumers read from the same topic in parallel, these messages must persist as per the retention policy you place on them. And there is no problem with setting a retention policy of infinity if you ask me. Companies have proven they can store petabytes of data on Kafka without any issues.

This is honestly a huge misconception with Kafka because of its reputation as a traditional "message queue". Unlike RabbitMQ or JMS, Kafka implements a "pull" approach to reading messages. Different consumers independently read from a persisted topic and mark their offset so they can later reread messages if need be. None of this would be possible if Kafka didn't persist the data it stores.

Of course this is all dependent on the hardware Kafka is running on. You still need the physical space on a machine to store the data you's not like Kafka provides a magic compression algorithm (although it does serialize and compress data automatically for you)

With that being said, Kafka won't give you the relational benefits of something like MySQL so if that is what your after then definitely implement something like that to integrate with your Kafka cluster.


It can.