Data Storytelling : Tags prediction
Data Storytelling is an important part of data analysis. While the first step is understanding the data, during our storytelling we have to present what we have actually understood. Before jumping into codes and building ML models, it is really important that we are clear about the data, its structure and the problem statement.
During the presentation of our work and findings, there may be people who are not much associated with data science or machine learning and they do not understand much of your code. They will be only interested in the problem statement, the intuition you are able to draw out of the data and the solution you get using the data. Here we have to be a good data storyteller to convince others about our work.
In this post I have tried some storytelling. Hope you like it…..
Few questions first…..
Do you ever have got stuck during any technical or programming task?
Where do you look for solutions to the errors you get while programming?Which site generally comes at the top of our search?
We all may be familiar with StackExchange. For those are not aware, this is a network of various Q&A websites. They use same software but are geared towards different domains. One of which is StackOverflow which deals with programming related questions. These are the platforms in which we can post our queries and also answer the questions by others.
Basically we have a big community at which we can seek help when we get stuck.
So, a lot of questions are posted on these websites. They have different titles and content. Along with that there are answers too. Its not like these websites just keep all these as it is. The data is processed before storing. And one of the important step is to annotate these questions or adding some tags to them.
Tags??
Tags are distinct words or small phrases that indicate particular topic.
If there will be no such things, then it will be really difficult to divide the data into groups. What if a person wants to find questions of a certain topic? Then how can the algorithm know which are the questions and answers related to tat particular topic. So, the tags help to organize the content.
Not only this, tagging also helps in SEO (Search Engine Optimization). When we tag all of your topical headings throughout the post, we tell search engines exactly what our content is discussing. This leads to more priority to our content or post when someone is looking for related topic.
But manually associating tags to the topics is quite cumbersome. It will be lot better if we can design certain algorithm that can predict the tags based on the title and content and add the tags automatically.
And as some of you have already guessed it, we can use Machine Learning for this. We can train the ML model on the set of content we have along with their tags and use it to predict the tags of upcoming contents on the website. But before building the model, we need to pre-process the text data using NLP (Natural Language Processing) techniques.
This can be an interesting project. Right?? Let’s work on it.
In my upcoming post, I will be taking you through the project.
Dataset: https://www.kaggle.com/c/transfer-learning-on-stack-exchange-tags/data
Stay tuned…..
Please give your feedback in the comment section. You can provide link to your storytelling post too. I will be really happy to read those.
Connect with me: www.linkedin.com/in/subham-kumar-sahoo-55563a136