Item-to-Item Recommenders
Item-to-Item Recommenders
Recommending similar items based on user preferred items is a simple and effective recommendation strategy. Gorse has implemented item-to-item recommenders that supports embedding similarity, tags similarity and users similarity.
Configuration
The new item-to-item recommenders need to be explicitly configured. The following three fields need to be filled in:
name
is the name of the recommender.type
is the similarity type, and the following values are supported:embedding
is the Euclidean distance between embedding vectors.tags
is the similarity based on the number of common tags.users
is the similarity based on the number of common users.
column
is the field used by the recommender to calculate the similarity, expressed in the Expr language. Suppose the embedding vectors of the README of GitHub repositories are stored in theembedding
field ofLabels
, and the tags are stored in thetopics
field:
{
"ItemId": "gorse-io:gorse",
"IsHidden": false,
"Categories": [],
"Timestamp": "2022-10-23T03:50:24Z",
"Labels": {
"embedding": [0.0017246103, -0.009725488, 0.005806058, -0.0187753, -0.015343021, ...],
"topics": ["machine-learning", "service", "recommender", "go", "recommender-system", "knn", "collaborative-filtering"]
},
"Comment": "An open source recommender system service written in Go"
}
If embedding
similarity is used, the value of column should be item.Labels.embedding
; if tags
similarity is used, the value of column should be item.Labels.topics
. When type is embedding
or tags
, column cannot be empty. However, when type is users
, column must be empty.
Algorithms
Embedding Similarity
There are -dimensional embedding vectors and for two items respectively, the embedding similarity between them is
The dimension of embedding vectors is usually relatively large. For example, the text-embedding-3-small
model of OpenAI generates embedding vectors with 1536 dimensions. Using embedding vectors requires more disk space and memory compared to tags. Compared with tags similarity, encoding text and images into embedding vectors can eliminate the cost of manual tags maintenance or automatic tags generation, and the similarity calculated by embedding vectors is more accurate. Embedding vectors can be generated from APIs provided by AI providers such as OpenAI and Anthropic, or self-deployed projects like Ollama and CLIP-as-service.
Tags Similarity
Tags similarity holds that the more overlapping tags there are between items, the more similar they are. First, calculate the TF-IDF of each tag
where represents the total number of items, and represents the number of items labeled with tag . If a tag is used by more items, it is more general and thus has a lower weight. Then, calculate the tags similarity between items
where and represent the tag sets of item and item respectively. Although tag similarity is not as accurate as embedding similarity, it can help users discover more content that has similarities but also differences.
Users Similarity
Users similarity believes that the more overlapping users there are between items, the more similar they are. Similarly, first, it is necessary to calculate the TF-IDF of each user
where represents the total number of items, and represents the number of items liked by user . Then, calculate the user similarity between items
where and represent the user sets of item and item respectively. One drawback of users similarity is that it tends to recommend popular items because popular items always have a high overlap of users with other items.
API
You can access item-to-item recommendations through the following API:
curl http://localhost:8087/api/item-to-item/<name>/<item-id>
Examples
Take the dataset of the demo project GitRec as an example. The embedding vectors of the README file of GitHub projects are stored in the embedding
field, and the tags are saved in the topics
field.
{
"ItemId": "gorse-io:gorse",
"IsHidden": false,
"Categories": [],
"Timestamp": "2022-10-23T03:50:24Z",
"Labels": {
"embedding": [0.0017246103, -0.009725488, 0.005806058, -0.0187753, -0.015343021, ...],
"topics": ["machine-learning", "service", "recommender", "go", "recommender-system", "knn", "collaborative-filtering"]
},
"Comment": "An open source recommender system service written in Go"
}
The configuration entry of an embedding similarity based item-to-item recommender:
[[recommend.item-to-item]]
name = "similar_embedding"
type = "embedding"
column = "item.Labels.embedding"
The configuration entry of a tags similarity based item-to-item recommender:
[[recommend.item-to-item]]
name = "similar_topics"
type = "tags"
column = "item.Labels.topics"