Is your dataset ready for Recommendation System?

In today’s hyper-connected digital world, personalization plays a huge role in what we buy (Amazon), what we watch (Netflix), what we listen to (Spotify) and what we wear (StitchFix). Behind the scene, personalized preferences are modeled by an AI recommendation system. The quality of the personalized recommendation of the items in such a platform is fueled by the quality of the data provided.

Enterprises like Amazon, Netflix, Spotify, and StitchFix have a team of data scientists and engineers to first access the quality and quantity of data, and then develop recommender models that best fit that data. However, new enterprises and startup companies get stuck for weeks and months mulling over the question, “do they have the right data quality and quantity for an AI recommendation system”.

An immediate solution to accessing the right dataset for the recommendation system

To hire a data science team to work on an AI recommendation system to create a personalized experience for your customers or learn machine learning to do so.


Building a data science team is expensive and time-consuming. Also, the learning machine has a steep learning curve.

Better alternative

The faster and better way to answer your demand is to look for a platform specialized in an AI recommendation system that can provide an end-to-end solution to provide a personalized experience to your customers. One such platform is

At Leapfrog, we use our in-house application: Caboom. It is a platform that helps to build your first recommendation system POC within a few minutes. It takes care of the complicated AI task to build your first recommendation system. The platform gives you an idea about the quality of the data that you need to create your first AI recommendation system.

How does  Caboom abstracts your complicated AI tasks?

A. Data

The Free Caboom version accepts three types of a comma-separated tabular dataset as follow:

1. User data: Those data that describe users’ information such as age, gender, and so on.

2. Item data: Those data that describe items such as item category, genre, and so on.

3. Interaction data: This data contains the information on which item is interacted by which users. For example, a product purchased by a user in an online store.

For our clients who do not have data ready or need help in cleaning or preparing the data, Caboom customized service provides specialized data engineers to prepare data that fits your requirements.  

B. Data quality and algorithm in Caboom

The free version of Caboom uses an algorithm that leverages interaction data to train and evaluate the AI model. In this version, users’ data and items’ data are used to map the user identifier and item identifier to its respective name in the evaluation section.

However, in the customize Caboom service we use those metadata to improve the quality of the recommendation model. The quality of the recommendation in the free Caboom version is dependent on the interaction data. To assure the quality of the uploaded data, Caboom uses a special score called richness score (detail blog in future) which guides users to know if the uploaded interaction data is a good fit to build the AI recommendation model or not.

Fig 1: Distribution of interaction of users with items

Fig 1: The plot shown in the above diagram is drawn using interaction data from one of the startups associated with Caboom. Here, the x-axis is the count of the item interaction by the user and the y-axis is the percentage of the user with the respective interaction count. The above diagram exhibits the problem of having poor interaction history. More than 95% of the user has interacted with only two items.‍

 A use case for recommender dataset

For new enterprises and startups, it is common to have limited interaction history. For example, a new online shopping site, there are no or few users who have purchased or rated the items they have purchased. Activities such as rating or buying is an example of interaction data. If there are no or few interaction users history, the AI recommendation model does not have enough interaction data to learn the users’ usage patterns to predict users’ preferences.

Generally, we refer to such conditions as a cold start problem where the AI model does not know users’ preferences and how to recommend items to users. In such conditions, we can either randomly recommend items or train our model using user metadata and item metadata.

Personalization of randomly recommended items is poorer than using users or item metadata. For example, in online shopping stores, we know the age group of the users. In this example, the age group of the users is users’ metadata. There is a high likelihood that items purchased by one age group such as teenagers are preferred by other teenage visitors in online stores. If a visitor is a teenager and does not have purchased history we can recommend items purchased by another teenager as shown in Fig 2. Similarly, we can also use items metadata to find similar items and recommend those items to users based on their item purchases.

Fig 2: Recommendation of items using users metadata(age group)

However, user metadata and item metadata needs more cleaning and engineering before the training recommendation model. To help those new startups and enterprises with the cold start problem, in Caboom customized service, Caboom data scientists will work closely with customers to blend AI techniques with business context to enrich the user and item metadata to build good performing AI models.‍

*Originally published at Caboom is an internal project of Leapfrog.

AI Team

This blog is written by the AI team at Leapfrog.

More in Blogs

Using FAISS Advanced Concepts to Speed Up Similarity Search in Recommender Systems  (Part II) Artificial IntelligenceInsights

Using FAISS Advanced Concepts to Speed Up Similarity Search in Recommender Systems (Part II)

Continued from the last blog. Improving on Flat Indexes Beyond the flat indexes that perform exhaustive searches, FAISS also has

Read more
Using FAISS basics to Speed up similarity search in Recommender Systems (Part I) Artificial Intelligence

Using FAISS basics to Speed up similarity search in Recommender Systems (Part I)

Measuring the similarity between vectors or arrays is a calculation that we encounter often while developing recommendation systems or other

Read more
Why is a No-Code Recommendation System worth a try? Artificial IntelligenceInsights

Why is a No-Code Recommendation System worth a try?

Lately, in the tech world, no-code systems have been getting immense attention, and for good reasons. These types of systems

Read more