Skills : python,NLP,Machine Learning,Data Visualization

About the Project

In this I will be building a Movie Recommender System based on IMDB Dataset.Recommender System are quite useful and popular these.whether it be an E-commerce company or food Delivery Service,Everyone requires a recommender system.It helps them to understand their customers while it also provides a smooth user flow for the users to go through the website.Throughout this Notebook ,I will be discussing various Approaches to build a recommender system ,common algorithms to build a recommender system and also a small demo of how recommendation will be shown to the user.

Exploring and Visualization

The Dataset we are using for this project can be found here.The Dataset comprises of 2 csv files.Lets import our libraries and see what data we have in the csv file

 
# linear algebra import numpy as np 
# data processing, CSV file I/O (e.g. pd.read_csv) 
import pandas as pd 
# for visualizations and plotting graphs 
import matplotlib.pyplot as plt 
import seaborn as sns 
#setting the style of grids
sns.set_style('whitegrid')

There are two files tmdb_5000_movies.csv and tmdb_5000_credits.csv.Lets go through them one by one

 
#Load the First csv File 
data_movies = pd.read_csv('Data/tmdb_5000_movies.csv') 
#Load the second csv File 
data_credits = pd.read_csv('Data/tmdb_5000_credits.csv')

 
#Print the first 1 rows of the data_movies 
data_movies.head(1)

budget	genres	homepage	id	keywords	original_language	original_title	overview	popularity	production_companies	production_countries	release_date	revenue	runtime	spoken_languages	status	tagline	title	vote_average	vote_count
0	237000000	[{"id": 28, "name": "Action"}, {"id": 12, "nam...	http://www.avatarmovie.com/	19995	[{"id": 1463, "name": "culture clash"}, {"id":...	en	Avatar	In the 22nd century, a paraplegic Marine is di...	150.437577	[{"name": "Ingenious Film Partners", "id": 289...	[{"iso_3166_1": "US", "name": "United States o...	2009-12-10	2787965087	162.0	[{"iso_639_1": "en", "name": "English"}, {"iso...	Released	Enter the World of Pandora.	Avatar	7.2	11800

Now lets understand what each feature is

 data_movies.columns


RangeIndex: 4803 entries, 0 to 4802
Data columns (total 20 columns):
budget                  4803 non-null int64
genres                  4803 non-null object
homepage                1712 non-null object
id                      4803 non-null int64
keywords                4803 non-null object
original_language       4803 non-null object
original_title          4803 non-null object
overview                4800 non-null object
popularity              4803 non-null float64
production_companies    4803 non-null object
production_countries    4803 non-null object
release_date            4802 non-null object
revenue                 4803 non-null int64
runtime                 4801 non-null float64
spoken_languages        4803 non-null object
status                  4803 non-null object
tagline                 3959 non-null object
title                   4803 non-null object
vote_average            4803 non-null float64
vote_count              4803 non-null int64
dtypes: float64(3), int64(4), object(13)
memory usage: 750.5+ KB
None

Some feature columns are explained as below:

genre: states the genre of the film along with the genre id i.e comedy,drama
budget: budget of the movie
homepage: url of the movie website
keywords: indicates the plot of keywords which define the movies
genre: states the genre of the film along with the genre id i.e comedy,drama
original_languages: original language in which movie was produced
popularity : Popularity of movie
vote_avg : Average votes given to that movie
vote_count : total no of votes given to that movie

We will be exploring many of the columns later in the notebook and also understand what they

Exploratory Data Analysis

In this section we will be analyzing the feature columns , their distribution over the dataset and what insights do we get from the plots or charts

1.1 Original language

Lets find out what is most popular language in which films are produced

 
fig = plt.figure(figsize=(8,8)) 
sns.countplot(y=data_movies['original_language'],data=data_movies)

1.2 Popular Genres

In this section we will analyze which genre is most common among movies. i.e on which genre most of the movies are based.

 
import ast genres_movies = dict() 
for movies in data_movies['genres']: 
  movie = ast.literal_eval(movies) 
  for genre in movie: 
    if(genre['name'] not in genres_movies.keys()): 
      genres_movies[genre['name']] = 1 
    else: 
      genres_movies[genre['name']]= genres_movies[genre['name']]+1

Recommender Systems

Recommendation Systems are of mainly 3 different types.Each one offers a different level of personalization.Most recommender systems take either of two basic approaches: collaborative filtering or content-based filtering. Other approaches (such as hybrid approaches) also exist.

Collaborative Filtering

In Collaborative Filtering,recommendations are based on prior user experience.Collaborative filtering uses group knowledge to form a recommendation based on like users.
Content Based Filtering

Content-based filtering constructs a recommendation on the basis of a user's behavior.This definitely requires user browser history ,what kinds of content and blogs he is reading and recommends based on the History.Suppose we were to build a blog recommendation and if a particular user reads blogs about linux , the system would recommend blogs like operating system , ubuntu and other unix based operating.This is what user may like.
Hybrid Approach

In hybrid based Approach , that combines both the Collaborative and Content based approach is more complex as compared to both above.The Content based filtering could be used if the user data is not mature enough to be compared with other user data and give a cold start to the system but slight shift towards Collaborative approach when the user data grows enough.

Weighted Voting Approach

Weighted Voting Approach is simple common approach which doesn't takes any user activity into account. It simply uses a predefined ranking formula to rate movie based on their votes,views etc.

If we peek into our data,we have a lot of features like Vote Average, Vote Count,Popularity

.After a lot of search for IMDB genuine Approach to rate movies, i found one on a [Answer](https://www.quora.com/How-does-IMDbs-rating-system-work) written in Quora.I will be using that approach for movies rating for recommendation.

Building our Simple Engine

The rating formula for IMDB is :

weighted rating (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C
R : average for the movie (mean) = (Rating)
v:number of votes for the movie = (votes)
m: minimum votes required to be listed in the Top 250 (currently 25000)
C :the mean vote across the whole report (currently 7.0)

Now lets peek again into our movies data

 
data_movies.head(1)

	budget	genres	homepage	id	keywords	original_language	original_title	overview	popularity	production_companies	production_countries	release_date	revenue	runtime	spoken_languages	status	tagline	title	vote_average	vote_count
0	237000000	[{"id": 28, "name": "Action"}, {"id": 12, "nam...	http://www.avatarmovie.com/	19995	[{"id": 1463, "name": "culture clash"}, {"id":...	en	Avatar	In the 22nd century, a paraplegic Marine is di...	150.437577	[{"name": "Ingenious Film Partners", "id": 289...	[{"iso_3166_1": "US", "name": "United States o...	2009-12-10	2787965087	162.0	[{"iso_639_1": "en", "name": "English"}, {"iso...	Released	Enter the World of Pandora.	Avatar	7.2	11800

Lets calculate the mean vote,average vote

vote_counts = data_movies[data_movies['vote_count'].notnull()]['vote_count'].astype('int') 
# we will be using minimum votes as 95% percentile of the maximum votes received 
#calculating the average votes 
vote_averages = data_movies[data_movies['vote_average'].notnull()]['vote_average'].astype('int') 
# setting minimum vote requirements to be 85 percentile of the votes of the movies 
minimum_votes = votes_count.quantile(0.85) 
print 'minimum_votes :',minimum_votes

Now we have calculated the minimum vote requirements for the movies to be listed in our chart.Let's extract all the movies from our dataframe which qualifies above criteria

 
qualified = data_movies[(data_movies['vote_count'] >= m) & (data_movies['vote_count'].notnull()) & (data_movies['vote_average'].notnull())][['title', 'year', 'vote_count', 'vote_average', 'popularity', 'genres']] 
qualified['vote_count'] = qualified['vote_count'].astype('int')
qualified['vote_average'] = qualified['vote_average'].astype('int') 
qualified.shape

(721, 6)

According to our criteria for listing movies we have 721 movies ready to be recommended to our user

Now we will use the standard IMDB rating formula to calculate the weighted rating for all qualified movies.

 
def weighted_rating(x): 
  v = x['vote_count'] 
  R = x['vote_average'] 
  return (v/(v+m) * R) + (m/(m+v) * C)

we have defined our weighted_rating function which calculates weighted rating for movies.

 
qualified['wr'] = qualified.apply(weighted_rating, axis=1) 
qualified = qualified.sort_values('wr', ascending=False).head(100)

	index	title	year	vote_count	vote_average	popularity	genres	wr
90	96	Inception	2010	13752	8	167.583710	[{"id": 28, "name": "Action"}, {"id": 53, "nam...	7.740771
63	65	The Dark Knight	2008	12002	8	187.322927	[{"id": 18, "name": "Drama"}, {"id": 28, "name...	7.706669
89	95	Interstellar	2014	10867	8	724.247784	[{"id": 12, "name": "Adventure"}, {"id": 18, "...	7.679307
355	662	Fight Club	1999	9413	8	146.757391	[{"id": 18, "name": "Drama"}]	7.635784
209	262	The Lord of the Rings: The Fellowship of the Ring	2001	8705	8	138.049577	[{"id": 12, "name": "Adventure"}, {"id": 14, "...	7.610012
671	3232	Pulp Fiction	1994	8428	8	121.463076	[{"id": 53, "name": "Thriller"}, {"id": 80, "n...	7.598908
558	1881	The Shawshank Redemption	1994	8205	8	136.747729	[{"id": 18, "name": "Drama"}, {"id": 80, "name...	7.589499
241	329	The Lord of the Rings: The Return of the King	2003	8064	8	123.630332	[{"id": 12, "name": "Adventure"}, {"id": 14, "...	7.583318
388	809	Forrest Gump	1994	7927	8	138.133331	[{"id": 35, "name": "Comedy"}, {"id": 18, "nam...	7.577132
242	330	The Lord of the Rings: The Two Towers	2002	7487	8	106.914973	[{"id": 12, "name": "Adventure"}, {"id": 14, "...	7.555959
659	2912	Star Wars	1977	6624	8	126.393695	[{"id": 12, "name": "Adventure"}, {"id": 28, "...	7.507603
72	77	Inside Out	2015	6560	8	128.655964	[{"id": 18, "name": "Drama"}, {"id": 35, "name...	7.503594
593	2285	Back to the Future	1985	6079	8	76.603233	[{"id": 12, "name": "Adventure"}, {"id": 35, "...	7.471239
679	3337	The Godfather	1972	5893	8	143.659698	[{"id": 18, "name": "Drama"}, {"id": 80, "name...	7.457567
561	1990	The Empire Strikes Back	1980	5879	8	78.517830	[{"id": 12, "name": "Adventure"}, {"id": 28, "...	7.456509
510	1553	Se7en	1995	5765	8	79.579532	[{"id": 80, "name": "Crime"}, {"id": 9648, "na...	7.447740
621	2522	The Imitation Game	2014	5723	8	145.364591	[{"id": 36, "name": "History"}, {"id": 18, "na...	7.444438
302	494	The Lion King	1994	5376	8	90.457886	[{"id": 10751, "name": "Family"}, {"id": 16, "...	7.415565
506	1532	The Grand Budapest Hotel	2014	4519	8	74.417456	[{"id": 35, "name": "Comedy"}, {"id": 18, "nam...	7.329502
575	2091	The Silence of the Lambs	1991	4443	8	18.174804	[{"id": 80, "name": "Crime"}, {"id": 18, "name...	7.320630
466	1196	The Prestige	2006	4391	8	74.440708	[{"id": 18, "name": "Drama"}, {"id": 9648, "na...	7.314423
545	1818	Schindler's List	1993	4329	8	104.469351	[{"id": 18, "name": "Drama"}, {"id": 36, "name...	7.306872
699	3865	Whiplash	2014	4254	8	192.528841	[{"id": 18, "name": "Drama"}]	7.297514
362	690	The Green Mile	1999	4048	8	103.698022	[{"id": 14, "name": "Fantasy"}, {"id": 18, "na...	7.270458
688	3573	Memento	2000	4028	8	60.715151	[{"id": 9648, "name": "Mystery"}, {"id": 53, "...	7.267720
594	2294	Spirited Away	2001	3840	8	118.968562	[{"id": 14, "name": "Fantasy"}, {"id": 12, "na...	7.240940
592	2284	The Shining	1980	3757	8	78.699993	[{"id": 27, "name": "Horror"}, {"id": 53, "nam...	7.228483
712	4300	Reservoir Dogs	1992	3697	8	66.925866	[{"id": 80, "name": "Crime"}, {"id": 53, "name...	7.219221
638	2731	The Godfather: Part II	1974	3338	8	105.792936	[{"id": 18, "name": "Drama"}, {"id": 80, "name...	7.158794
684	3454	The Usual Suspects	1995	3254	8	64.025031	[{"id": 18, "name": "Drama"}, {"id": 80, "name...	7.143281
...	...	...	...	...	...	...	...	...
57	57	WALL·E	2008	6296	7	66.390712	[{"id": 16, "name": "Animation"}, {"id": 10751...	6.657562
483	1356	The Hangover	2009	6173	7	82.211660	[{"id": 35, "name": "Comedy"}]	6.651926
82	88	Big Hero 6	2014	6135	7	203.734590	[{"id": 12, "name": "Adventure"}, {"id": 10751...	6.650147
240	328	Finding Nemo	2003	6122	7	85.688789	[{"id": 16, "name": "Animation"}, {"id": 10751...	6.649535
46	46	X-Men: Days of Future Past	2014	6032	7	118.078691	[{"id": 28, "name": "Action"}, {"id": 12, "nam...	6.645233
190	231	Monsters, Inc.	2001	5996	7	106.815545	[{"id": 16, "name": "Animation"}, {"id": 35, "...	6.643483
152	182	Ant-Man	2015	5880	7	120.093610	[{"id": 878, "name": "Science Fiction"}, {"id"...	6.637723
160	191	Harry Potter and the Prisoner of Azkaban	2004	5877	7	79.679601	[{"id": 12, "name": "Adventure"}, {"id": 14, "...	6.637572
363	693	Gone Girl	2014	5862	7	143.041543	[{"id": 9648, "name": "Mystery"}, {"id": 53, "...	6.636813
216	276	Harry Potter and the Chamber of Secrets	2002	5815	7	132.397737	[{"id": 12, "name": "Adventure"}, {"id": 14, "...	6.634414
179	216	Life of Pi	2012	5797	7	51.328145	[{"id": 12, "name": "Adventure"}, {"id": 18, "...	6.633487
260	356	Sherlock Holmes	2009	5766	7	57.834787	[{"id": 28, "name": "Action"}, {"id": 12, "nam...	6.631879
79	85	Captain America: The Winter Soldier	2014	5764	7	72.225265	[{"id": 28, "name": "Action"}, {"id": 12, "nam...	6.631775
196	239	Gravity	2013	5751	7	110.153618	[{"id": 878, "name": "Science Fiction"}, {"id"...	6.631096
105	114	Harry Potter and the Goblet of Fire	2005	5608	7	101.250416	[{"id": 12, "name": "Adventure"}, {"id": 14, "...	6.623460
104	113	Harry Potter and the Order of the Phoenix	2007	5494	7	78.144395	[{"id": 12, "name": "Adventure"}, {"id": 14, "...	6.617143
315	519	Now You See Me	2013	5487	7	71.124686	[{"id": 53, "name": "Thriller"}, {"id": 80, "n...	6.616748
214	274	Gladiator	2000	5439	7	95.301296	[{"id": 28, "name": "Action"}, {"id": 18, "nam...	6.614018
499	1465	The Maze Runner	2014	5371	7	131.815575	[{"id": 28, "name": "Action"}, {"id": 9648, "n...	6.610084
115	124	Frozen	2013	5295	7	165.125366	[{"id": 16, "name": "Animation"}, {"id": 12, "...	6.605592
8	8	Harry Potter and the Half-Blood Prince	2009	5293	7	98.885637	[{"id": 12, "name": "Adventure"}, {"id": 14, "...	6.605472
508	1541	Toy Story	1995	5269	7	73.640445	[{"id": 16, "name": "Animation"}, {"id": 35, "...	6.604031
12	12	Pirates of the Caribbean: Dead Man's Chest	2006	5246	7	145.847379	[{"id": 12, "name": "Adventure"}, {"id": 14, "...	6.602639
94	101	X-Men: First Class	2011	5181	7	3.195174	[{"id": 28, "name": "Action"}, {"id": 878, "na...	6.598655
248	339	The Incredibles	2004	5152	7	77.817571	[{"id": 28, "name": "Action"}, {"id": 12, "nam...	6.596851
342	628	Saving Private Ryan	1998	5048	7	76.041867	[{"id": 18, "name": "Drama"}, {"id": 36, "name...	6.590247
359	687	300	2006	4997	7	65.197968	[{"id": 28, "name": "Action"}, {"id": 12, "nam...	6.586929
395	828	Kill Bill: Vol. 1	2003	4949	7	79.754966	[{"id": 28, "name": "Action"}, {"id": 80, "nam...	6.583756
70	74	Edge of Tomorrow	2014	4858	7	79.456485	[{"id": 28, "name": "Action"}, {"id": 878, "na...	6.577606
357	675	Jurassic Park	1993	4856	7	40.413191	[{"id": 12, "name": "Adventure"}, {"id": 878, ...	6.577468

Now we have implemented a basic rating system which gives pretty good recommendations based on various movies metrics.We see that three Christopher Nolan Films, Inception, The Dark Knight and Interstellar occur at the very top of our chart. The chart also indicates a strong bias of IMDB Users towards particular genres and directors.

Now lets build recommendations based on particular movie genres.85% of movie genre ratings will be used to build our movie chart

Genre Based Recommendation

As we have seen above,we calculated the mean_vote and vote percentile for rating and listing movies,we will be using the same strategy to build recommendation

 
# Its returns movies if movie contains the specified genre 
def genre_movies(x,genre): 
  if genre in x['movie_genres']: 
    return x

Now lets write a function that returns top movies belonging to particular genre

 
# it returns all movies of particular genre if they qualified movie criteria to be listed 
def genre_recommend(genre,percentile): 
  gen_movie = data_movies.apply(lambda x : ret_mov(x,genre),axis=1) 
  gen_movie = gen_movie.dropna() 
  vote_count= gen_movie[gen_movie['vote_count'].notnull()]['vote_count'].astype('int') 
  vote_averages = gen_movie[gen_movie['vote_average'].notnull()]['vote_average'].astype('int') 
  c = vote_averages.mean() 
  m = vote_counts.quantile(percentile) 
  gen_qualified = gen_movie[(gen_movie['vote_count'] >= m) & (gen_movie['vote_count'].notnull()) & (gen_movie['vote_average'].notnull())][['title', 'year', 'vote_count', 'vote_average', 'popularity', 'genres']] 
  gen_qualified['vote_count'] = gen_qualified['vote_count'].astype('int')
  gen_qualified['vote_average'] = gen_qualified['vote_average'].astype('int') 
  gen_qualified['wr'] = gen_qualified.apply(lambda x:weighted_rating(x),axis=1) 
  gen_qualified = gen_qualified.sort_values('wr',ascending=False) 
  return gen_qualified

Now lets check our recommendations.

 
genre_recommend('Science Fiction',0.85).head(10)

	title	year	vote_count	vote_average	popularity	genres	wr
96	Inception	2010	13752	8	167.583710	[{"id": 28, "name": "Action"}, {"id": 53, "nam...	7.740771
95	Interstellar	2014	10867	8	724.247784	[{"id": 12, "name": "Adventure"}, {"id": 18, "...	7.679307
2912	Star Wars	1977	6624	8	126.393695	[{"id": 12, "name": "Adventure"}, {"id": 28, "...	7.507603
2285	Back to the Future	1985	6079	8	76.603233	[{"id": 12, "name": "Adventure"}, {"id": 35, "...	7.471239
1990	The Empire Strikes Back	1980	5879	8	78.517830	[{"id": 12, "name": "Adventure"}, {"id": 28, "...	7.456509
0	Avatar	2009	11800	7	150.437577	[{"id": 28, "name": "Action"}, {"id": 12, "nam...	6.801430
16	The Avengers	2012	11776	7	144.448633	[{"id": 878, "name": "Science Fiction"}, {"id"...	6.801066
94	Guardians of the Galaxy	2014	9742	7	481.098624	[{"id": 28, "name": "Action"}, {"id": 878, "na...	6.764424
127	Mad Max: Fury Road	2015	9427	7	434.278564	[{"id": 28, "name": "Action"}, {"id": 12, "nam...	6.757506
634	The Matrix	1999	8907	7	104.309993	[{"id": 28, "name": "Action"}, {"id": 878, "na...	6.745153

Now our movie recommender system recommends movies based on a particular genre.this could serve as one of the strategy for recommending movies.But it provides no personalization for users and recommendations are purely genre based.Let's improve our system by using content like movies overview,tagline for recommendations

Content Based Filtering

As we have already seen a couple of techniques for building recommendation engines we will be targeting a content based recommendation engine. we will be utilizing the tagline and overview of the to recommend the Movies.Lets start building our content based recommender system

 # create a shallow copy so that we don't alter the original dataframe 
 content_movies = data_movies.copy(False)

As you can infer we have created a auxiliary dataframe to hold the overview and the tagline.Now lets merge these two columns and join them to create a description column in the content_movies dataframe.

 
# join overview and tagline content_movies
['description'] = content_movies['tagline'] + content_movies['overview'] 
#remove missing data 
content_movies['description'] = content_movies['description'].fillna('')

We have joined the feature columns and remove useless data.Now we will extract features from the description using Tfidf(Term frequency inverse document frequency)

 
from sklearn.feature_extraction.text import TfidfVectorizer 
tf = TfidfVectorizer(analyzer='word',ngram_range=(1, 2),min_df=0, stop_words='english') tfidf_matrix = tf.fit_transform(content_movies['description'])

Now lets see our tfidf matrix shape to analyze what what tfidf matrix really is

 
tfidf_matrix.shape


(4803, 128039)

we have got our Tfidf matrix of shape 4803x128039.the 4803 is total no of movies in our dataframe.128039 is size of feature vector i.e we have got 128039 features for a single movie.Now we have to predict how similar a movie is to other movies using its feature vector.

Cosine Similarity

Cosine Similarity is a measure of similarity between two non zero vectors.It measures the cosine angle between two non zero vectors. cosine 0represents maximum similarity with value of cosine 1. cosine 0 represents least similarity.

Cosine similarity between two vectors can be mathematically represented as

$Cosine Similarity$

In our case we have calculated the Tfidf matrix where each row represents a feature vector of a particular movie. Since we need to predict similarity between feature vectors of movies in Tfidf matrix ,we need to find the dot-product of tfidf matrix which will give us a cosine similarity score.This score denotes how similar movies are to each other.

We will be using sklearn's linear_kernel which is faster than dot product and gives us the similarity score

 
from sklearn.pairwise import linear_kernel 
cos_sim = linear_kernel(tfidf_matrix)

 
cos_sim[0]

array([ 1.        ,  0.00680476,  0.        , ...,  0.        ,
0.00344913,  0.        ])

Now lets build recommendation using our calculated similarity scores.Note that now movies will be recommended using the tagline and movie overview.This is a much better approach as compared to our previous approaches as now recommendations will much be user centric.

 
def improved_recommend(title): 
  idx = indices[title] 
  sim_scores = list(enumerate(cos_sim[idx])) 
  sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True) 
  sim_scores = sim_scores[1:31] movie_indices = [i[0] for i in sim_scores]
  return titles.iloc[movie_indices]

 
improved_recommend('The Shawshank Redemption').head(20)

637                     Les Misérables
3785                            Prison
609                        Escape Plan
2868                          Fortress
4727                      Penitentiary
434                   The Longest Yard
3978                          Amnesiac
1187                   Bridge of Spies
642                           Unbroken
891                    Man on the Moon
1779            The 40 Year Old Virgin
791                         Goosebumps
1171                Dumb and Dumber To
1166                          Get Hard
2570                 Ramona and Beezus
1597                          Southpaw
42                         Toy Story 3
2798    The Boy in the Striped Pyjamas
1860                Kiss of the Dragon
3871                 A Christmas Story
Name: title, dtype: object

It seems like recommendations have improved from previous approaches.Movies like The Prison and Escape Plan seems to have a similar story plot as The Shawshank Redemption.

Here we come to the End of the Project.We have build a movie recommendation system that recommends movies based on user choice and similarity of movies.

There are several approaches which could further improve the recommendations like using CastCrew Members,Director in the Description so that user will get much better recommendation.

Also hybrid approaches like using popularity as well as description of the movies to predict similarity .I might also try these approaches later.

Thankyou for reading !!

IMDB Movie Recommendation System

About the Project

Exploring and Visualization

Exploratory Data Analysis

1.1 Original language

1.2 Popular Genres

Recommender Systems

Collaborative Filtering

Content Based Filtering

Hybrid Approach

Weighted Voting Approach

Building our Simple Engine

Genre Based Recommendation

Content Based Filtering

Cosine Similarity