Kafka for Beginners

Ankit Bourasi
6 min readSep 8, 2024

--

Hello guys! 👋 If you’re reading this, you’re probably curious about Apache Kafka. Don’t worry if you’ve never heard of it before — we’re going to start from scratch and make this journey fun and easy to understand.

I am Ankit and you’re learning Kafka, Let’s dive into this!

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

Apache Kafka is a tool that helps move and process large amounts of data quickly and reliably between different parts of a computer system

Let’s understand by simple example of Twitter:

  1. A User does the following actions:
  • user posts a tweet
  • another user likes the tweet
  • a third user retweets it

All of this Twitter actions acts as events in the applications, and these actions sent to Kafka because twitter has 368 million active Users,

so it very difficult to process and manage those events that’s why companies use Kafka for high volume of Data.

  • Each of these actions generates an “event” that gets sent to Kafka
  • Kafka organizes these events into “topics” (e.g., “new_tweets”, “likes”, “retweets”)

The Twitter Components

  • Tweet Storage Service: Saves new tweets to a database
  • Notification Service: Sends notifications to users
  • Analytics Service: Tracks trending topics and user engagement
  • Timeline Service: Updates users’ timelines

2. How it works:

  • These services “subscribe” to relevant Kafka topics
  • When an event occurs, Kafka quickly distributes it to all subscribed services
  • Each service processes the event as needed

For example, when a tweet is posted:

  • Tweet Storage Service saves the tweet
  • Notification Service notifies followers
  • Analytics Service updates trending topics
  • Timeline Service adds the tweet to followers’ timelines

Kafka allows all of this to happen simultaneously and efficiently, even with millions of users and tweets. It ensures that no events are lost, even if a service temporarily goes down, and can handle massive amounts of data in real-time.

Understand Kafka Terminologies

  1. Kafka Cluster: A Kafka Cluster is a group of Kafka servers (or machines) that work together to manage and store streams of data
  2. Kafka Broker (or Kafka Server): A Kafka Broker is one individual server in the Kafka Cluster. It stores data and handles requests from producers and consumers.
  3. Kafka Topic: A Kafka Topic is a category or you can think of a Table of SQL databases in which messages are sent.

4. Partition: A Partition is a smaller piece of a topic. Each topic can be divided into several partitions to allow for parallel processing and better performance.

You can think of partitions as sections within a topic where messages are stored in a particular order.

5. Segment: A Segment is a file within a partition that holds a portion of the messages. As new messages come in, they are added to segments, which are like chapters in a book.

When a segment gets too large, it’s split into new segments.

6. Producer: A Producer is an application or component that sends messages to a Kafka topic. Think of it as a sender or publisher that creates and pushes data into Kafka.

7. Consumer: A Consumer is an application or component that reads messages from a Kafka topic. It acts like a reader or subscriber that retrieves and processes data from Kafka.

8. Messages: Messages are the actual pieces of data sent between producers and consumers. They contain the information being transmitted, like a text message or a data record. Messages are organized and stored in topics and partitions.

Putting It All Together:

  • Kafka Cluster: A collection of servers working together.
  • Kafka Broker: An individual server in the cluster.
  • Kafka Topic: A category for organizing messages.
  • Partition: A sub-section within a topic for organizing messages.
  • Segment: A file that stores a portion of messages in a partition.
  • Producer: The entity that sends messages to a topic.
  • Consumer: The entity that reads messages from a topic.
  • Messages: The actual data being sent and received.

Kafka Architecture -

Kafka is a system that efficiently manages and processes streams of data across a network of servers, allowing different applications to publish and consume data in real-time.

Download Kafka

You need to download Kafka binaries from the official website.
Download Kafka:

  • Go to the Apache Kafka Downloads page.
  • Download the latest binary version.
  • Extract the downloaded ZIP or TAR file to your desired location. rename folder to kafka

Note: To install kafka, you need Java 8 installed in your system.

Start the Kafka server

  1. Go to your kafka directory, Open terminal in current directory such as Command Prompt[CMD], Powershell, Bash, Tmux, Warp.
Bash Terminal Linux

2. Before start kafka server, we need to start the ZOO KEEPER
Kafka uses ZooKeeper to manage the Kafka cluster. ZooKeeper is included with the Kafka download, but it needs to be started before Kafka can run.

You don’t need to install ZooKeeper separately as it’s bundled with Kafka.

Start ZooKeeper

Kafka requires ZooKeeper to run, so you need to start it first. Run the following command from the Kafka installation directory:

Kafka with Zookeeper For [Linux] User

Run the following commands in order to start all services in the correct order:

# Start the ZooKeeper service                    
$ bin/zookeeper-server-start.sh config/zookeeper.properties

Open another terminal session and run:

# Start the Kafka broker service
$ bin/kafka-server-start.sh config/server.properties

Once all services have successfully launched, you will have a basic Kafka environment running and ready to use.

Kafka with Zookeeper For [Windows] User

Run the following commands in CMD in order to start all services in the correct order:

# Start the ZooKeeper service                    
$ bin/windows/zookeeper-server-start.sh config/zookeeper.properties

Open another terminal session in CMD and run:

# Start the Kafka broker service
$ bin/windows/kafka-server-start.sh config/server.properties

Once all services have successfully launched, you will have a basic Kafka environment running and ready to use.

zookeeper-server-start file start the zookeeper, zookeeper.properties is configuration file for zookeeper

kafka-server-start file start the kafka server, server.properties is a configuration file for kafka

You should see these type of log in Zookeeper terminal, it indicates that Zookeeper has started successfully.

Kafka Server Logs if successfully started:-

Congratulation You have successfully start the Kafka environment running and ready to use.

For more understanding and hands on experience visit this blog

--

--

Ankit Bourasi

Software Engineer | Data Science | Story Teller | Content Writing | Travelling