Live data streaming project using Kafka— part 4

Subham Kumar Sahoo
6 min readOct 8, 2022

Kafka project to stream bidding data from Python Flask website. Data is fetched using multiple consumers in parallel and stored in MySQL database. Dashboard created using Power BI tool.

Previous part : https://medium.com/@subham-sahoo/live-data-streaming-project-using-kafka-part-3-91dda2856aa0

Today we will be creating another Consumer application that will get all the messages from the Kafka topic and keep these in a csv file.

GitHub repository : https://github.com/sksgit7/Big-data-projects/tree/main/kafka%20project1

And there is a Bonus at the end!!

Step 1 : Create consumer application

Create file_writer.py file in “kafka project1” directory. Get the code from GitHub repository.

The only change is that “group.id” will be different from the previous consumer codes. So, this consumer code will fetch all the messages.

Code link : https://github.com/sksgit7/Big-data-projects/blob/main/kafka%20project1/file_writer.py

Now let’s run the code “python file_writer.py”. Before that ensure that you are in correct directory and activate the virtual environment.

Here we can see in seconds the application fetched 27 messages. These are all the messages in the topic produced till now. Because this application belongs to a different consumer group (group.id) and the messages are there in Kafka topic (retention is 1 week).

After we stop this program, we will see a file output.csv.

Here we can see the third consumer application has successfully written the messages into the csv file.

Running all stuffs together

Now we can start all the programs (one producer, 3 consumers). And have produced 6 bids.

Consumer 1:

Consumer 2:

Consumer 3 (file writer):

Here we can see the consumer 1 and 2 have fetched 2 and 4 records respectively (have shared the records). While consumer 3 (file writer) have fetched all 6 records and have written to output.csv file.

The format of the timestamp column might look different while opening in excel tool. If you open on notepad, it looks fine.

Hh,7196,2022–10–06 22:30:44
Bb,1484,2022–10–06 22:30:47
Dd,9561,2022–10–06 22:30:48
Ff,8426,2022–10–06 22:30:58
Ee,5155,2022–10–06 22:30:59
Ff,4693,2022–10–06 22:31:00

In the Db table:

Bonus section — Power BI

Now let’s build a cool dashboard over the data 😅.

The idea is just to build an end-to-end pipeline, even if it is simple one. And an end-to-end workflow is incomplete without a cool dashboard!!

Readers who know Hindi : Kuch jyada nahi ho raha..

Let’s jump to work. But first you need to have Power BI application which I will be using. But feel free to use any other tool if you like.

Get Power BI desktop tools for here : https://www.microsoft.com/en-us/download/details.aspx

Open the Power BI application and click on Get data.

Click on “More” and select MySQL DB. Then Connect.

Put the server and database details and click Ok.

Then select the table and load the data. Generally we first click on Transform and after applying some transforms we Load it. But as this data is simple and we do not have any transformations to apply, we can directly click on Load.

On the right we can see the table and fields.

Then choose the “stacked column chart” from visualization. Drag and drop the name column in Axis and price as value. Then choose the Top N filter based on price.

This will generate a similar chart.

Now let’s create a “card” visualization.

After drag and drop price into Fields, click on the down arrow and choose maximum/minimum to see the min/max bidding price.

Similarly we can create all other visuals.

After few modification I was able to get this..

Again you can add more visuals if you have other attributes in your table. We can even have relationship between tables. Just go with some YouTube content and play around with this.

Done??

Just one more step..

Let’s produce some more bids and we will come back to this dashboard.

Added few records like (Bb, 11000), (Gg, 1000) and (Hh, 11500).

After last entry the page was like:

And then click on Refresh in Power BI tool.

Here we can see the maximum bid has changed as well as “Bb” has came to 2nd position. Hurray!!

Note : Actually the maximum bid was showing as 12K (rounding off). So I changed the Display units to None under Data label.

We can also Publish this Power BI dashboard online. That I am leaving for you guys to explore.

Done!!

Disclaimer

While creating this simple but end-to-end ;) pipeline, I came across some issues and I tried to resolve those too. For few of those I have put some notes and few are in code comments. Similarly you can also come across some new issues as there are several components.

So, just do not get demotivated and leave stuffs halfway. Believe me the answers to your problems lie in just one search on internet (maybe more than one 😉).

Improvements

There is scope of improvement at every component. You can add more data as well as other components like use Spark in place of Python or using ML for real-time predictions etc. And more data and attributes will lead to a more sophisticated dashboard.

You can try out more configurations on Kafka cloud (paid) and even install Apache Kafka on local or take everything on cloud. I will add if I work on any improvements for this project.

If you liked this project, kindly do clap. Feel free to share it with others.

👉Follow me on medium for more such interesting contents and projects.

Check out my other projects here : https://medium.com/@subham-sahoo/

Connect with me at LinkedIn. ✨

Thank you!!

References

https://docs.confluent.io/platform/current/clients/confluent-kafka-python/html/index.html

https://hevodata.com/learn/mysql-to-power-bi/

https://superuser.com/questions/283673/how-to-get-windows-command-prompt-to-display-time

https://www.geeksforgeeks.org/how-to-automatic-refresh-a-web-page-in-fixed-time/

--

--