Unlike common frontend, backend, and AI open-source projects, it is difficult for potential users to directly perceive the utility of an open-source recommender system. Therefore, authors of Gorse built GitRec, a recommender system for GitHub repositories. This project not only demonstrates the basic capabilities of Gorse but also helps users discover interesting and useful repositories among the massive amount of open-source projects.
After a year and a half of development, Gorse v0.5 is released. This release includes breaking changes, so it is worth writing an article to introduce the new features and upgrade guide.
New Features
New Data Schema
-
The
Subscribefield ofUserhas been removed. In the early design, theSubscribefield was used to store tags subscribed by users, but it was not actually used. It is recommended to use External Recommenders to implement business logic driven recommendations like subscriptions. -
The
Labelsfield ofUserandItemsupports arbitrary JSON objects. On this basis, some recommenders can specify nested fields to use. For example, in item-to-item recommenders, you can specify using theLabels.embeddingembedding vector to calculate item similarity. -
The
Valuefield has been added to feedback. TheValuefield is used to represent the weight of feedback, such as the percentage of video watching, the rating of goods, etc. On the one hand, positive and negative feedback can be distinguished by setting thresholds based on theValuefield; on the other hand, future updates will utlize theValuefield to better train models. When inserting feedback, you can choose to accumulate (POST) theValue, or overwrite (PUT) the originalValue.
Thanks to the Bianbu Cloud RISC-V cloud computing instances provided by SpacemiT, Gorse has now completed support for the RISC-V architecture.
Embedding models encode multimodal information such as images and text into high-dimensional vectors, enabling the calculation of relationships between multimodal information by measuring the distance between embedding vectors in online services like search engines and recommender systems. Text embeddings are the most widely used. Major AI service providers offer text embedding APIs to their users, and there are also many open-source text embedding models available for self-hosting. The current mainstream evaluation standard for text embedding models is MTEB. However, MTEB does not assess the capabilities of text embedding models in recommender systems, and this post will attempt to evaluate the performance of text embedding models in recommender systems.
GitHub Actions provides various continuous integration environments for projects hosted on GitHub, including three operating systems (Linux, macOS, and Windows) and two architectures (AMD64 and ARM64). These environments are sufficient for most projects, but RISC-V developers may find it challenging to run RISC-V workflows on GitHub Actions. Commercial companies can use self-hosted runners (refer to Supporting runners on 64bit RISC-V) or service providers (RISC-V Runners and Cloud-V), but for individual developers, this represents a significant expense.
In the era of large language models, low-precision floating-point numbers are no strangers to developers, with BF16 being one of the most widely supported low-precision floating-point formats. This article will introduce how to use BF16 in the Go programming language.
Introduction to BF16
AVX512 is the latest generation of SIMD instructions released by Intel, which can process 512 bits of data in one instruction cycle, equivalent to 16 single-precision floating point numbers or 8 double-precision floating point numbers. The training and inference process of recommendation models in Gorse requires a lot of vector computation, and AVX512 can theoretically bring some acceleration effect. Unfortunately, the Go compiler does not automatically generate SIMD instructions.
The ability to compile into a single binary is a great feature of the Go programming language, avoiding dependency management at deployment time. However, if the project contains front-end code, we need to find a way to embed the front-end artifact into the Go binary at compile time. The compilation process is as follows.
