Algorithms
Algorithms
Gorse has implemented various types of recommendation algorithms, both non-personalized and personalized. These recommendation algorithms are building blocks that compose the recommendation workflow.
Math formulas are used to introduce complex algorithms, we define math symbols used in this section as follows:
Symbol | Meaning |
---|---|
<cache_size> | |
The set of positive feedbacks. | |
The set of negative feedbacks. | |
The set of items. | |
The set of favorite items by user . | |
The set of items with label . | |
The set of users. | |
The set of users with label . | |
The labels of user . | |
The labels of item . | |
The labels used by all users. | |
The labels used by all items. | |
The neighbors pf user . | |
The neighbors of item . | |
Equals 1 if condition satisfies, otherwise equals 0. |
Tips
The <cache_size>
comes from the configuration file:
[recommend]
# The cache size for recommended/popular/latest items. The default value is 100.
cache_size = 100
Non-personalized Algorithms
Non-personalized algorithms recommend the same content to all users.
Latest Items
Showing the latest items to users according to timestamps allows a new item to be exposed to users in time. To enable the latest recommender in Gorse, you need to set timestamp information for the items. Without timestamps, Gorse will not generate a list of the latest items.
The latest items recommendation is equivalent to the following SQL:
select item_id from items order by time_stamp desc limit <cache_size>;
Popular Items
Many websites show recent popular items to users such as Twitter trending. The popular items recommendation is equivalent to the following SQL:
select item_id from (
select item_id, count(*) as feedback_count from feedback
where feedback_type in <positive_feedback_types>
and time_stamp >= NOW() - INTERVAL <popular_window>
group by item_id) t
order by feedback_count desc limit <cache_size>;
Tips
The <popular_window>
in the configuration file corresponds to the window of popular items.
[recommend.popular]
# The time window of popular items. The default values is 4320h.
popular_window = "720h"
Similarity Algorithms
In some scenarios, users like specific types of items, for example, gamers like to solve puzzles and young children like to watch cartoons.
Item Similarity
An item is similar to another item if they share more common users or labels than others. The cosine similarity is ideal to capture the similarity between two items. Since the importance of each user or label is different, we use IDF to calculate the weight of each user or label.
- Label weight: The label weight is defined as
If a label is used by more items, this label is more generic and has a lower weight.
- User weight: The user weight is defined as
If a user has more favorite items, it means that this user has a wider preference and he or she has a lower weight.
Based on label weights and user weights, Gorse implements three similarity algorithms:
- Similar: Calculates similarity based on label overlap between items
- Related: Calculates similarity based on user overlap between items.
- Automatic: Calculates similarity based on both label overlap and user overlap between items.
For each item, top items with top similarity will be cached as neighbors of this item. The algorithm for item similarity can be set in the configuration file.
[recommend.item_neighbors]
# The type of neighbors for items. There are three types:
# similar: Neighbors are found by number of common labels.
# related: Neighbors are found by number of common users.
# auto: Neighbors are found by number of common labels and users.
# The default value is "auto".
neighbor_type = "similar"
If items are attached with high-quality labels, similar
is the best choice. If items have no labels, use related
. For other situations, consider auto
.
User Similarity
The computation of user similarity is similar to the computation of item similarity. First, we use IDF to calculate the weight of each item or label.
- Label weight: The label weight is defined as
If a label is used by more users, this label is more generic and has a lower weight.
- Item weight: The item weight is defined as
If an item has more users, it means that this item is popular but has a lower weight to characterize users' preferences.
Based on label weights and item weights, Gorse implements three similarity algorithms:
- Similar: Calculates similarity based on label overlap between users
- Related: Calculates similarity based on item overlap between users.
- Automatic: Calculates similarity based on both label overlap and item overlap between users.
For each user, the top users with top similarity will be cached as neighbors of this user. The algorithm for user similarity can be set in the configuration file.
[recommend.user_neighbors]
# The type of neighbors for users. There are three types:
# similar: Neighbors are found by number of common labels.
# related: Neighbors are found by number of common favorite items.
# auto: Neighbors are found by number of common labels and favorite items.
# The default value is "auto".
neighbor_type = "similar"
If users are attached to high-quality labels, similar
is the best choice. If users have no labels, use related
. For other situations, consider auto
.
Clustering Index
Gorse needs to generate neighbors for each user and item, but it is an expensive procedure. The complexity to generate neighbors for all items is by brute force (for simplicity, assume the complexity of similarity calculation is constant). The clustering index[1] is more efficient to search neighbors for each item with acceptable precision decay. The usage of the clustering index consists of two steps:
- Clustering: The spherical k-means algorithm is used to cluster items to clusters with centroids (). Then, each item belongs to -th cluster.
- while or changed at previous step do
- the centroid of items belong to -th cluster
- end while
- Search: For searching neighbors for item .
- Step 1, find the nearest centroid to item .
- Step 2, find the nearest items to item on clusters.
The time complexity is , where is the maximal number of iterations to stop the clustering algorithm. In Gorse implementation, , the time complexity becomes . Hence, the clustering index is efficient if .
The clustering index can be switched on and off by enable_index
, which is turned on by default. The clustering index needs to set the parameter , which is the number of clusters to query. Too small will cause the index to fail to reach the required recall, while too large will reduce the performance. The construction process tries to increase . If the query recall reaches index_recall
, or the growth epochs reach index_fit_epoch
, the build process stops increasing .
[recommend.item_neighbors]
# Enable approximate item neighbor searching using vector index. The default value is true.
enable_index = true
# Minimal recall for approximate item neighbor searching. The default value is 0.8.
index_recall = 0.8
# Maximal number of fit epochs for approximate item neighbor searching vector index. The default value is 3.
index_fit_epoch = 3
[recommend.user_neighbors]
# Enable approximate user neighbor searching using vector index. The default value is true.
enable_index = true
# Minimal recall for approximate user neighbor searching. The default value is 0.8.
index_recall = 0.8
# Maximal number of fit epochs for approximate user neighbor searching vector index. The default value is 3.
index_fit_epoch = 3
Personalized Algorithms
There are lots of fancy recommendation algorithms these days and most of them are based on deep learning[2]. However, we believe traditional methods without deep learning are sufficient to achieve compromising recommendation performance.
Item Similarity-based Recommendation
For a user with favorite items , if we know the similarity between any pair of items, The probability that a user likes an item is predicted by the sum of similarity between the item and favorite items.
where is the similarity between the item and the item . For a user, the time complexity to search top recommended items in the full item set is . In practice, for most pairs of two items, their similarity is zero. Hence, we could search for recommended items in neighbors of favorite items.
- Collect candidates from neighbors of favorite items.
- For each item in , calculate the prediction score by
the indicator means the similarity is summed to the prediction score only if is in the neighbors of item . Since Gorse only caches top neighbors and the similarity of each item, lots of similarities are missing in the cache. The time complexity of the optimized algorithm is .
User Similarity-based Recommendation
The user similarity-based recommendation is the same as the item similarity-based recommendation:
- Collect candidates from favorite items of neighbors of the user.
- For each item in , calculate the prediction score by
the indicator means the similarity is summed to the prediction score only if is favored by user . The time complexity is as well.
Matrix Factorization
In matrix factorization models, items and users are represented by vectors. The probability that a user likes an item is predicted by the dot product of two vectors.
where is the embedding vector of the user , and is the embedding vector of the item . The model of matrix factorization is simple, but there is more than one training algorithm.
BPR
BPR[3] (Bayesian Personalized Ranking) is a pairwise training algorithm. The training data for BPR consist of a set of triples:
The semantics of is that user is assumed to prefer over . The negative cases are regarded implicitly.
The Bayesian formulation of finding the correct personalized ranking for all items is to maximize the following posterior probability where represents the parameter vectors of the matrix factorization model
where is the desired but latent preference for user . All users are presumed to act independently of each other. BPR also assumes the ordering of each pair of items for a specific user is independent of the ordering of every other pair. Hence, the above user-specific likelihood function can first be rewritten as a product of single densities and second be combined for all users .
where and .
For the parameter vectors, BPR introduces a general prior density which is a normal distribution with zero mean and variance-covariance matrix .
where . Then the optimization criterion for personalized ranking is
The gradient of BPR-Opt with respect to the model parameters is:
A stochastic gradient-descent algorithm based on a bootstrap sampling of training triples is as follows:
- initialize
- repeat
- draw from
- util convergence
- return
where . The derivatives from embedding vectors are
eALS
eALS[4] is a point-wise training algorithm. For a pair of a user and an item , the ground truth for training is
Embedding vectors are optimized by minimizing the following cost function[5]:
where and the weight of feedback
() is the weight for negative feedbacks. The derivative of the objective function with respect to is
By setting this derivative to 0, obtain the solution of (Eq 1):
where denotes the element of the cache, defined as . Similarly, get the solver for an item embedding vector (Eq 2):
where denotes the element of the cache, defined as . The learning algorithm is summarized as
- Randomly initialize and
- for do
- while Stopping criteria is not met do
- for to do
- for to do
- for do
- Eq 1
- for do
- end
- end
- for to do
- for to do
- for do
- Eq 2
- for do
- end
- end
- return and
Random Search for Hyper-parameters
There are hyper-parameters for model training such as learning rate, regularization strength, etc. Gorse uses random search[6] to find the best hyper-parameters. The hyper-parameters optimization behavior is set by model_search_epoch
and model_search_trials
. Large value might lead to better recommendations but cost more CPU time. The optimal model hyper-parameters are relatively stable unless the dataset changes dramatically.
[recommend.collaborative]
# The number of epochs for model searching. The default value is 100.
model_search_epoch = 100
# The number of trials for model searching. The default value is 10.
model_search_trials = 10
HNSW Index
The matrix factorization model in Gorse represents users and items as embedding vectors. For each user, items with large dot products of embedding vectors with the user are filtered as recommended items. Therefore, the most intuitive way to search for recommended items is to scan all items, calculate the dot product of embedding vectors during the scanning process, and select the top several items with the largest dot products as the recommended result. Assuming that there are N users and M items, the computational complexity to generate recommendation results for all users is . However, if the number of items and users is large, the overall computation is unacceptable.
A more efficient approach is to use the vector index HNSW[7]. The HNSW index creates a navigation graph for all item vectors. The results from HNSW are not accurate, but the small loss in the recall is worth the large performance gain. The HNSW requires a parameter, ef_construction, to be set. ef_construction that is too small will prevent the vector index from reaching the required recall, and that is too large will reduce search performance. The build process tries to keep increasing , and stops growing if the recall reaches index_recall
, or if the number of epochs reaches index_fit_epoch
.
[recommend.collaborative]
# Enable approximate collaborative filtering recommend using vector index. The default value is true.
enable_index = true
# Minimal recall for approximate collaborative filtering recommend. The default value is 0.9.
index_recall = 0.9
# Maximal number of fit epochs for approximate collaborative filtering recommend vector index. The default value is 3.
index_fit_epoch = 3
Note
HNSW index is complex and it is impossible to introduce it on this page. For more information, read the original paper: https://arxiv.org/abs/1603.09320
Factorization Machines
User labels and item labels are important information for personalized recommendations, but matrix factorization only handles user embedding and item embedding. Factorization machines^3 generate recommendations with rich features such as user features and item features.
Different from the learning algorithms for matrix factorization, negative feedbacks are used in factorization machine training. The training dataset is constructed by
The dimension of input vectors is the sum of the numbers of items, users, item labels and user labels: . Each element in for a pair is defined by
The prediction output for a input vector is
where the model parameters that have to be estimated are: , , . And is the dot product of two vectors. Parameters are optimized by logit loss with SGD. The loss function is
The gradient for each parameter is
Hyper-parameters are optimized by random search and the configuration recommend.collaborative
is reused.
Auvolat, Alex, et al. "Clustering is efficient for approximate maximum inner product search." arXiv preprint arXiv:1507.05910 (2015). ↩︎
Zhang, Shuai, et al. "Deep learning based recommender system: A survey and new perspectives." ACM Computing Surveys (CSUR) 52.1 (2019): 1-38. ↩︎
Rendle, Steffen, et al. "BPR: Bayesian personalized ranking from implicit feedback." Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. 2009. ↩︎
He, Xiangnan, et al. "Fast matrix factorization for online recommendation with implicit feedback." Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 2016. ↩︎
Hu, Yifan, Yehuda Koren, and Chris Volinsky. "Collaborative filtering for implicit feedback datasets." 2008 Eighth IEEE international conference on data mining. Ieee, 2008. ↩︎
Bergstra, James, and Yoshua Bengio. "Random search for hyper-parameter optimization." Journal of machine learning research 13.2 (2012). ↩︎
Malkov, Yu A., and Dmitry A. Yashunin. "Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs." IEEE transactions on pattern analysis and machine intelligence 42.4 (2018): 824-836. ↩︎