Collaborative Filtering Fundamentals
I have been trying to learn about recommender systems; we all have interacted with recommenders in some way or form in every walk of life. It could be, asking our like-minded friend for a movie suggestion, seeing product recommendation on eCommerce sites like Amazon, etc., recommenders have been around forever.
With the advent of the likes of Netflix, Spotify, etc.; recommender systems have taken off to another level all together. It makes sense for all of us to learn and understand the basics of such ubiquitous applications like recommender systems. Since this topic is so vast, in this article, I am summarizing about one of the most straight forward pieces of recommender systems: item — item collaborative filtering.
Item — item collaborative filtering shows up to the end consumers as widgets where items are recommended based on the items that the users or cohorts of similar users have interacted/rated. In Netflix kind of scenario, it is the widget that shows: “Users who liked what you like also like..”. Let’s see what are the core components to build this functionality out.
These are the main steps for building out item based filtering:
- Calculate similarity between the item to predict and all the rest
- Order items by similarity
- Select the Neighbourhood
- Calculate the predicted rating
- Use the predictions to calculate recommendations
Similarity calculations: this is mostly done offline in a batch, since calculating item similarity can be a memory intensive process, especially if we are talking about all the SKUs in the product catalog. Types of similarity functions:
- Jaccard Similarity: when we are calculating similarity on unary or binary dataset like bought (1), not bought (0), we use the Jaccard similarity function.
- Cosine similarity: This is used mostly when there is quantitative data, like rating a product/movie (7/10). There is a derivative of this function called adjusted cosine, that averages out the individual user rating of each item to remove rating bias.
- Pearson Similarity: this algorithm is also used for quantitative data, it helps identify likes, dislikes and unrelated items/users, also removes rating bias and is gives very similar results as adjusted cosine.
All these functions are statistical methods that we learn in school. I ll write another article with details on calculating similarity using the adjusted cosine function. For now, we will assume we were able to run code to generate the similarity matrix and store it in a table ranked in the order or most similar to most dissimilar.
Clustering: This is usually done to make the recommender respond faster. Basically, with clustering we set a criteria that helps look for less than all the items tagged as similar. There are various ways to do it, but for starters and to keep things simple, we can go with the following:
- Top N: we tell the system that N number of items should be in the neighbourhood, so you define an item as the center and consider N closest values to it in a 2D plot to be a part of the neighbourhood.
- Threshold: In this method, we define a cut off value in the similarity matrix to be a part of the neighbourhood, e.g. we want all the items have a similarity score above 0.7 (scores range: -1 to 1) to be a part of my cluster, then no matter how many values fall in the range, they all become a part of the cluster.
The choice of Top N and threshold is a trade off between quantity and quality, the business use case needs to define the choice, either of them work well.
Prediction: The most common ways to make predictions are regression and classification.
- Regression: this basically means you take an average of all the ratings and make the prediction, mostly used in real estate setting, when coming up with a list price of an house based on comparables of the neighbourhood.
- Classification: in this method, you take rank the items by frequency of occurrence and predict the value of the item that occurs most. For example, when you see an item mostly rated 4, then you give it a 4 too, basically you have rated it based on the classification method.
Use one of the above methods to predict the rating of an item in the neighbourhood. To generate a recommendation, we repeat the calculation for as many items as we need, lets say we need to send out top 5 recommendations, then we run the prediction rating on a set of items, rank them in the order of high to low and send out the top 5 back to the application.
Some other details to know about collaborative filtering:
- Sparcity: If the items don’t have a lot of ratings, then the recommendations will mostly fall back to recommend the most popular items, since the dataset won’t be dense enough to get a relevant recommendation.
- Gray Sheep: this refers to a type of users who have unusual tastes and would be impossible to find related users or items, this is possibly what happens when my kids watch paw patrol from my account, while I mostly watch documentary and crime series, the recommender systems must have put me in the gray sheep cohort. :-)
- Cold Start: collaborative filtering needs data to do the predictions, going by a thumb rule, apparently a good collaborative filter algorithm needs upto 20 ratings on an item before being able to run predictions, so what happens to new items? Applications either run a separate widget for new arrivals (I have seen this in Netflix, Amazon) or this could possibly be a reason to run the explore/exploit method for introducing newer products.
- Similarity: Collaborative filtering is content agnostic, it follows users’ behavioural trends. This again pushes the output towards more popular content.
The main positive about collaborative filtering is that is content agnostic. Only ratings and interactions between items are needed to get started, hence this is a great way to start learning about recommender systems.
Want to learn more about recommenders, here are some places to find more information:
- Item-based Collaborative Filtering Recommendation Algorithms” by Badrul Sarwar et al — http://files.grouplens.org/papers/www10_sarwar.pdf
- Practical Recommender Systems — by Kim Falk — available on o’reily online.