学习推荐系统~

理论成本函数

$min\ J(w^{(j)}, b^{(j)}) = \frac{1}{2m^{(j)}} \sum_{i:r(i, j) = 1}(w^{(j)} \cdot x^{(i)} + b^{(j) - y^{(i, j)}})^2$

正则化表示：

$min\ J(w^{(j)}, b^{(j)}) = \frac{1}{2m^{(j)}} \sum_{i:r(i, j) = 1}(w^{(j)} \cdot x^{(i)} + b^{(j) - y^{(i, j)}})^2 + \frac{\lambda}{2m^{(j)}\sum_{k=1}^n(w_k^{(j)})^2}$

推广到所有用户

协同过滤算法

Collaborative Filtering Algorithm

引入与概念

相反地考虑：

如果已经有w和b的参数值，那么我们也可以反推参数x

$J(x^{(1)}, x^{(2)}, ..., x^{(n_m)}) = \frac{1}{2}\sum_{i=1}^{n_m}\sum_{j:r(i,j)=1}(w^{(j)} \cdot x^{(i)} +b^{(j)} - y^{(i, j)})^2 + \frac{\lambda}{2}\sum_{i=1}^{n_m}\sum_{k=1}^n(x_k^{(i)})^2$

可以组合起来：

$J(w, b, x) = \frac{1}{2}\sum_{(i, j):r(i, j) = 1}(w^{(j)} \cdot x^{(i)} +b^{(j)} - y^{(i, j)})^2 + \frac{\lambda}{2}\sum_{j=1}^{n_u}\sum_{k=1}^n(w_k^{(j)})^2 + \frac{\lambda}{2}\sum_{i=1}^{n_m}\sum_{k=1}^n(x_k^{(i)})^2$

现在损失函数是w,b,x三者的函数

所以梯度下降是针对三者的同步下降

$w_i^{(j)} = w_i^{(j)} - \alpha \frac{\partial}{\partial w_i^{(j)}}J(w, b, x)$ $b^{(j)} = b^{(j)} - \alpha\frac{\partial}{\partial b^{(j)}}J(w, b, x)$ $x_k^{(i)} = x_k^{(i)} - \alpha\frac{\partial}{\partial x_k^{(i)}}J(w, b, x)$

针对二进制标签的公式变形：

均值归一化

简而言之：就是先减均值，再加均值

减完均值后放到$x^{(i)}$中，最后的方程再加上均值

优点：

加速推荐系统搭建
针对没给出过样例或样例很少的新用户可以预测的更合理（即均值）

TensorFlow实现协同过滤算法

TensorFlow内置的gradient tape可以帮助我们实现偏导数的求解(梯度下降)：

w = tf.Variable(3.0)
x = 1.0
y = 1.0 # target value
alpha = 0.01

# 以J = (wx - 1) ^ 2为例

iterations = 30
for iter in range(iterations):
    with tf.GradientTape() as tape:
        fwb = w * x
        costJ = (fwb - y) ** 2
    
    [dJdw] = tape.gradient(costJ, [w])  # Auto Diff
    
    w.assign_add(-alpha * dJdw)

协同过滤算法实现：

# Adam优化器
optimizer = keras.optimizers.Adam(learning_rate=1e-1)

iterations = 200
for iter in range(iterations):
    with tf.GradienTape() as tape:
        cost_value = cofiCostFuncV(X, W, b, Ynorm, R, num_users, num_movies, lambda)  # Ynorm: 归一化后评分y 
        grads = tape.gradient(cost_value, [X, W, b])
        # 使用Adam优化器迭代
        optimizer.apply_gradients(zip(grads, [X, W, b]))

寻找相关特征

如何判断两个样例的近似程度呢？

我们可以通过得到的$x^{(i)}$特征参数的距离来判断两个样例的近似程度

目前算法存在的问题

主要问题：

冷启动问题
- 如何处理没有多少评分的新的样例？
- 如何处理没有评过多少分的新用户？
使用项目或用户的边缘信息
- 项目：类型等
- 用户：性别、年龄、位置、偏好

implement code

损失函数计算

$J(w, b, x) = \frac{1}{2m}\sum_{(i, j):r(i, j) = 1}(w^{(j)} \cdot x^{(i)} +b^{(j)} - y^{(i, j)})^2 + \frac{\lambda}{2}\sum_{j=1}^{n_u}\sum_{k=1}^n(w_k^{(j)})^2 + \frac{\lambda}{2}\sum_{i=1}^{n_m}\sum_{k=1}^n(x_k^{(i)})^2$

暴力实现：

def cofi_cost_func(X, W, b, Y, R, lambda_):
    """
    Returns the cost for the content-based filtering
    Args:
      X (ndarray (num_movies,num_features)): matrix of item features
      W (ndarray (num_users,num_features)) : matrix of user parameters
      b (ndarray (1, num_users)            : vector of user parameters
      Y (ndarray (num_movies,num_users)    : matrix of user ratings of movies
      R (ndarray (num_movies,num_users)    : matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th user
      lambda_ (float): regularization parameter
    Returns:
      J (float) : Cost
    """
    nm, nu = Y.shape
    J = 0
    ### START CODE HERE ###  
    
    for j in range(nu):
        w = W[j, :]
        b_j = b[0,j]
        
        for i in range(nm):
            x = X[i, :]
            y = Y[i, j]
            r = R[i, j]
            J += np.square(r * (np.dot(w, x) - y + b_j))
    
    # 正则化
    J += lambda_ * (np.sum(np.square(W)) + np.sum(np.square(X)))
    
    J /= 2
    ### END CODE HERE ### 

    return J

向量API实现：

def cofi_cost_func_v(X, W, b, Y, R, lambda_):
    """
    Returns the cost for the content-based filtering
    Vectorized for speed. Uses tensorflow operations to be compatible with custom training loop.
    Args:
      X (ndarray (num_movies,num_features)): matrix of item features
      W (ndarray (num_users,num_features)) : matrix of user parameters
      b (ndarray (1, num_users)            : vector of user parameters
      Y (ndarray (num_movies,num_users)    : matrix of user ratings of movies
      R (ndarray (num_movies,num_users)    : matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th user
      lambda_ (float): regularization parameter
    Returns:
      J (float) : Cost
    """
    j = (tf.linalg.matmul(X, tf.transpose(W)) + b - Y)*R
    J = 0.5 * tf.reduce_sum(j**2) + (lambda_/2) * (tf.reduce_sum(X**2) + tf.reduce_sum(W**2))
    return J

模拟添加新用户及其部分评分：

movieList, movieList_df = load_Movie_List_pd()

my_ratings = np.zeros(num_movies)          #  Initialize my ratings

# Check the file small_movie_list.csv for id of each movie in our dataset
# For example, Toy Story 3 (2010) has ID 2700, so to rate it "5", you can set
my_ratings[2700] = 5 

#Or suppose you did not enjoy Persuasion (2007), you can set
my_ratings[2609] = 2;

# We have selected a few movies we liked / did not like and the ratings we
# gave are as follows:
my_ratings[929]  = 5   # Lord of the Rings: The Return of the King, The
my_ratings[246]  = 5   # Shrek (2001)
my_ratings[2716] = 3   # Inception
my_ratings[1150] = 5   # Incredibles, The (2004)
my_ratings[382]  = 2   # Amelie (Fabuleux destin d'Amélie Poulain, Le)
my_ratings[366]  = 5   # Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)
my_ratings[622]  = 5   # Harry Potter and the Chamber of Secrets (2002)
my_ratings[988]  = 3   # Eternal Sunshine of the Spotless Mind (2004)
my_ratings[2925] = 1   # Louis Theroux: Law & Disorder (2008)
my_ratings[2937] = 1   # Nothing to Declare (Rien à déclarer)
my_ratings[793]  = 5   # Pirates of the Caribbean: The Curse of the Black Pearl (2003)
my_rated = [i for i in range(len(my_ratings)) if my_ratings[i] > 0]

print('\nNew user ratings:\n')
for i in range(len(my_ratings)):
    if my_ratings[i] > 0 :
        print(f'Rated {my_ratings[i]} for  {movieList_df.loc[i,"title"]}');

将新用户加入数据集中并归一化：

# Reload ratings and add new ratings
Y, R = load_ratings_small()
Y    = np.c_[my_ratings, Y]
R    = np.c_[(my_ratings != 0).astype(int), R]

# Normalize the Dataset
Ynorm, Ymean = normalizeRatings(Y, R)

归一化函数：

def normalizeRatings(Y, R):
    """
    Preprocess data by subtracting mean rating for every movie (every row).
    Only include real ratings R(i,j)=1.
    [Ynorm, Ymean] = normalizeRatings(Y, R) normalized Y so that each movie
    has a rating of 0 on average. Unrated moves then have a mean rating (0)
    Returns the mean rating in Ymean.
    """
    Ymean = (np.sum(Y*R,axis=1)/(np.sum(R, axis=1)+1e-12)).reshape(-1,1)
    Ynorm = Y - np.multiply(Ymean, R) 
    return(Ynorm, Ymean)

初始化参数：

#  Useful Values
num_movies, num_users = Y.shape
num_features = 100

# Set Initial Parameters (W, X), use tf.Variable to track these variables
tf.random.set_seed(1234) # for consistent results
W = tf.Variable(tf.random.normal((num_users,  num_features),dtype=tf.float64),  name='W')
X = tf.Variable(tf.random.normal((num_movies, num_features),dtype=tf.float64),  name='X')
b = tf.Variable(tf.random.normal((1,          num_users),   dtype=tf.float64),  name='b')

# Instantiate an optimizer.
optimizer = keras.optimizers.Adam(learning_rate=1e-1)

训练模型：

iterations = 200
lambda_ = 1
for iter in range(iterations):
    # Use TensorFlow’s GradientTape
    # to record the operations used to compute the cost 
    with tf.GradientTape() as tape:

        # Compute the cost (forward pass included in cost)
        cost_value = cofi_cost_func_v(X, W, b, Ynorm, R, lambda_)

    # Use the gradient tape to automatically retrieve
    # the gradients of the trainable variables with respect to the loss
    grads = tape.gradient( cost_value, [X,W,b] )

    # Run one step of gradient descent by updating
    # the value of the variables to minimize the loss.
    optimizer.apply_gradients( zip(grads, [X,W,b]) )

    # Log periodically.
    if iter % 20 == 0:
        print(f"Training loss at iteration {iter}: {cost_value:0.1f}")

推荐与查看模型效果：

# Make a prediction using trained weights and biases
p = np.matmul(X.numpy(), np.transpose(W.numpy())) + b.numpy()

#restore the mean
pm = p + Ymean

my_predictions = pm[:,0]

# sort predictions
ix = tf.argsort(my_predictions, direction='DESCENDING')

for i in range(17):
    j = ix[i]
    if j not in my_rated:
        print(f'Predicting rating {my_predictions[j]:0.2f} for movie {movieList[j]}')

print('\n\nOriginal vs Predicted ratings:\n')
for i in range(len(my_ratings)):
    if my_ratings[i] > 0:
        print(f'Original {my_ratings[i]}, Predicted {my_predictions[i]:0.2f} for {movieList[i]}')

表格查询：

filter=(movieList_df["number of ratings"] > 20)
movieList_df["pred"] = my_predictions
movieList_df = movieList_df.reindex(columns=["pred", "mean rating", "number of ratings", "title"])
movieList_df.loc[ix[:300]].loc[filter].sort_values("mean rating", ascending=False)

基于内容的过滤算法

协同过滤算法：根据与你相似的用户给出的评分去给你推荐产品

基于内容过滤算法：根据用户和项目特征去找到好的匹配去给你推荐产品

神经网络架构

如何做用户特征$X_u$，项目特征$X_m$到$V_u$，$V_m$的映射？通过神经网络

$V_u$和$V_m$具有相同的结构，所以输出层的神经元个数先沟通

损失函数

应用

电影推荐
产品推荐（最可能下单）
广告推荐（最可能点击）
高利润产品推荐（用户需求不一定处于第一位）
视频黏性增加用户观看时长

TensorFlow实现基于内容过滤算法

num_outputs = 32
tf.random.set_seed(1)
user_NN = tf.keras.models.Sequential([
    ### START CODE HERE ###   
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(num_outputs),
    ### END CODE HERE ###  
])

item_NN = tf.keras.models.Sequential([
    ### START CODE HERE ###     
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(num_outputs),
    ### END CODE HERE ###  
])

# create the user input and point to the base network
input_user = tf.keras.layers.Input(shape=(num_user_features))
vu = user_NN(input_user)
vu = tf.linalg.l2_normalize(vu, axis=1)

# create the item input and point to the base network
input_item = tf.keras.layers.Input(shape=(num_item_features))
vm = item_NN(input_item)
vm = tf.linalg.l2_normalize(vm, axis=1)

# compute the dot product of the two vectors vu and vm
output = tf.keras.layers.Dot(axes=1)([vu, vm])

# specify the inputs and output of the model
model = Model([input_user, input_item], output)

model.summary()

tf.random.set_seed(1)
cost_fn = tf.keras.losses.MeanSquaredError()
opt = keras.optimizers.Adam(learning_rate=0.01)
model.compile(optimizer=opt, loss=cost_fn)

tf.random.set_seed(1)
model.fit([user_train[:, u_s:], item_train[:, i_s:]], ynorm_train, epochs=30)

model.evaluate([user_test[:, u_s:], item_test[:, i_s:]], ynorm_test)

PS:

为了加速推荐系统的效率，$V_m$可以在用户接入网络前就计算好

Polaris6G's blog

机器学习-06