User labels and item labels are important information for personalized recommendations, but matrix factorization only handles user embedding and item embedding. Factorization machines^3 generate recommendations with rich features such as user features and item features.
Different from the learning algorithms for matrix factorization, negative feedbacks are used in factorization machine training. The training dataset is constructed by
The dimension of input vectors x is the sum of the numbers of items, users, item labels and user labels: F=∣I∣+∣U∣+∣LI∣+∣LU∣. Each element in x for a pair (u,i) is defined by
where the model parameters that have to be estimated are: w0∈R, w∈Rn, V∈Rn×k. And ⟨⋅,⋅⟩ is the dot product of two vectors. Parameters are optimized by logit loss with SGD. The loss function is
C=(x,y)∈D∑−ylog(y^)−(1−y)log(1−y^)
The gradient for each parameter is
∂θ∂y^=⎩⎨⎧1,xi,xi∑j=1nvj,fxj−vi,fxi2,if θ is w0if θ is wiif θ is vi,f
Hyper-parameters are optimized by random search and the configuration recommend.collaborative is reused.