0%

机器学习-01

开始正式学习机器学习~

  • 学习框架总览
  • 监督学习
    • 回归模型
    • 分类(逻辑回归)
  • 过拟合处理
  • 正则化
  • 监督学习
  • 无监督学习
  • 强化学习

应用:

监督学习

Supervised Learning

学习input,output,建立x -> y的映射

  • Regression 回归 房价预测

  • Classification 分类 有限的输出类别 肿瘤诊断(良性、恶性)

无监督学习

Unsupervised Learning

找到一种结构、模式、或有趣的东西 unlabeled

  • clustering 聚类 (相关文章、遗传聚类、市场细分)
  • Anomaly detection 异常检测 找异常数点
  • Dimensionality reduction 降维 数据压缩

回归模型

Regression model

线性回归

linear regression

训练集 training set

格式:

表示:

代价函数

cost function

均方误差代价函数:

梯度下降

Gradient Descent

可以得到局部最优解 local minimum

更新公式如下(update until convergence 收敛):

simultaneous update 严格:同步更新

  1. 导数(偏导数) -> 切线方式
  1. $\alpha$ is learning rate

如果$\alpha$ 太小 -> too slow to convergence

如果$\alpha$ 太大 -> overshoot never reach minimum 可能无法收敛,甚至发散(diverge)

注意:固定的学习率可以到达局部最小值

微积分求导:(链式求导)

总结:

线性回归只有一个最小值 -> 凸函数

所以局部最小值就是全局最小值

多元线性回归

多特征 -> 向量化 / 矢量化(行向量)

向量点乘

向量点乘是指对两个向量对应位一一相乘之后求和的操作,点乘的结果是一个标量

1
2
3
4
5
6
7
8
9
10
w = np.array([1.0, 2.5, -3.3])
b = 4
x = np.array([10, 20, 30])

for j in range(0, x.shape[0]):
f = f + w[j] * x[j]
f = f + b

# same but much faster
f = np.dot(w, x) + b

硬件 + 并行 使得 矢量化运行更快

多元回归的梯度下降公式

特征放缩

feature scaling

当有不同的特征取值范围非常不同时,它可能会导致梯度下降运行缓慢,重新缩放不同的特征,使他们都具有可比较的值范围。

归一化 normalization

mean normalization

Z-score normalization

即正态分布

tips:

  1. 确保梯度下降有效进行或判断收敛的方式

    1. learning curve

    2. automatic convergence test

  1. 如何选择好的学习率

try:

0.001 0.01 0.1 1

可3倍增加

特征工程

使用直觉或灵感去设计新的特征变量(通过转换或合并原始特征)

多项式回归

多项式也可以认为成特征工程

拟合起来可以被称为暴力打法:

1
2
3
4
5
6
7
8
9
10
x = np.arange(0,20,1)
y = np.cos(x/2)

X = np.c_[x, x**2, x**3,x**4, x**5, x**6, x**7, x**8, x**9, x**10, x**11, x**12, x**13]
X = zscore_normalize_features(X)

model_w,model_b = run_gradient_descent_feng(X, y, iterations=1000000, alpha = 1e-1)

plt.scatter(x, y, marker='x', c='r', label="Actual Value"); plt.title("Normalized x x**2, x**3 feature")
plt.plot(x,X@model_w + model_b, label="Predicted Value"); plt.xlabel("x"); plt.ylabel("y"); plt.legend(); plt.show()

Scikit Learn

线性回归黑盒API使用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import numpy as np
from sklear.linear_model import LinearRegression, SGDRegressor
from sklearn.preprocessing import StandardScaler

X_train, y_train = load_data()

# 归一化
scaler = StandardScaler()
X_norm = scaler.fit_transform(X_train)

# 定义回归模型
sgdr = SGDResgressor(max_iter=1000)

# 训练
sgdr.fit(X_norm, y_train)

print(f"number of iterations completed: {sgdr.n_iter_}, number of weight updates: {sgdr.t_}")

# 查看训练后参数情况
b_norm = sgdr.intercept_
w_norm = sgdr.coef_

# 预测
# make a prediction using sgdr.predict()
y_pred_sgd = sgdr.predict(X_norm)
# make a prediction using w,b.
y_pred = np.dot(X_norm, w_norm) + b_norm
print(f"prediction using np.dot() and sgdr.predict match: {(y_pred == y_pred_sgd).all()}") # True

SGDRegressorscikit-learn库中的一个线性回归模型,它使用随机梯度下降(SGD)算法来拟合数据。它支持不同的损失函数和正则化项,包括L1L2和弹性网络正则化。

在训练过程中,SGDRegressor使用随机梯度下降算法来更新模型参数。这意味着它在每个训练样本上进行一次参数更新,而不是像传统的批量梯度下降算法那样在整个训练集上进行一次更新。这使得SGDRegressor非常适合处理大型数据集,因为它可以在内存有限的情况下进行训练。

实例(LinearRegression()的使用):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
X_train = np.array([1.0, 2.0])   #features
y_train = np.array([300, 500]) #target value

linear_model = LinearRegression()
#X must be a 2-D Matrix
linear_model.fit(X_train.reshape(-1, 1), y_train)

b = linear_model.intercept_
w = linear_model.coef_
print(f"w = {w:}, b = {b:0.2f}")
print(f"'manual' prediction: f_wb = wx+b : {1200*w + b}")

y_pred = linear_model.predict(X_train.reshape(-1, 1))

print("Prediction on training set:", y_pred)

X_test = np.array([[1200]])
print(f"Prediction for 1200 sqft house: ${linear_model.predict(X_test)[0]:0.2f}")

分类

逻辑回归

即使它叫回归,但它是分类的一种方法

binary classification

decision boundary

线性回归分类的效果:

logistic function(sigmoid):

线性回归的拟合:

将分类问题转化为 -> 是1或0的概率

1
2
3
def sigmoid(z):
g = 1/(1+np.exp(-z))
return g

决策边界

decision boundary

栗子:

  • 线性边界:

  • 非线性边界:

多项式与高次幂可以致使决策边界变为封闭(不再线性)

损失函数

Logistic loss function

凸函数

函数图像:

总结:

简化损失函数

简化逻辑回归的损失函数 -> 二元交叉熵损失函数

核心要义:y只能取1或取0

Loss函数:

so,整体上的损失函数可以化为:

此式源于:极大似然估计的统计原理

代码实现:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
def compute_cost_logistic(X, y, w, b):
"""
Computes cost

Args:
X (ndarray (m,n)): Data, m examples with n features
y (ndarray (m,)) : target values
w (ndarray (n,)) : model parameters
b (scalar) : model parameter

Returns:
cost (scalar): cost
"""

m = X.shape[0]
cost = 0.0
for i in range(m):
z_i = np.dot(X[i],w) + b
f_wb_i = sigmoid(z_i)
cost += -y[i]*np.log(f_wb_i) - (1-y[i])*np.log(1-f_wb_i)

cost = cost / m
return cost

交叉熵求导

证明:

so, 逻辑回归的梯度下降公式如下:

其实和线性回归的下降公式还不一样,注意区别:

Scikit learn

逻辑回归黑盒API使用

DataSet

1
2
3
4
import numpy as np

X = np.array([[0.5, 1.5], [1,1], [1.5, 0.5], [3, 0.5], [2, 2], [1, 2.5]])
y = np.array([0, 0, 0, 1, 1, 1])

Fit

1
2
3
4
from sklearn.linear_model import LogisticRegression

lr_model = LogisticRegression()
lr_model.fit(X, y)

predictions

1
2
3
y_pred = lr_model.predict(X)

print("Prediction on training set:", y_pred)

calculate accuracy

1
print("Accuracy on training set:", lr_model.score(X, y))

过拟合

概念

过拟合 Over Fitting 高方差 high variance (高度可变)

欠拟合 Under Fitting 高偏差 high bias

修正

  • 扩大训练集 Collect more training examples

  • 特征选择 select a subset of the more relevant features

    • 缺点:可能会丢失一些有效信息
  • 正则化 Regularization:保留所有的特征,但是防止特征产生过大的影响

    减小wj的参数设置

正则化

regularization

simpler model and less likely to overfit

任务:minimize 如下公式(以线性为例):

均方误差 + 正则化项

其中,$\lambda$ 称为正则化系数

如果$\lambda$过小,可能会过拟合(相当于没正则化)

如果$\lambda$过大,可能会欠拟合(只剩个b)

随着$\lambda$增大,$w_j$的规模会变小

线性回归的正则化

正则化的本质:

在每次迭代中将w与一个略比1小的数相乘,具有收缩的效果

cost function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
def compute_cost_linear_reg(X, y, w, b, lambda_ = 1):
"""
Computes the cost over all examples
Args:
X (ndarray (m,n): Data, m examples with n features
y (ndarray (m,)): target values
w (ndarray (n,)): model parameters
b (scalar) : model parameter
lambda_ (scalar): Controls amount of regularization
Returns:
total_cost (scalar): cost
"""

m = X.shape[0]
n = len(w)
cost = 0.
for i in range(m):
f_wb_i = np.dot(X[i], w) + b #(n,)(n,)=scalar, see np.dot
cost = cost + (f_wb_i - y[i])**2 #scalar
cost = cost / (2 * m) #scalar

reg_cost = 0
for j in range(n):
reg_cost += (w[j]**2) #scalar
reg_cost = (lambda_/(2*m)) * reg_cost #scalar

total_cost = cost + reg_cost #scalar
return total_cost #scalar

Gradient decline:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
def compute_gradient_linear_reg(X, y, w, b, lambda_): 
"""
Computes the gradient for linear regression
Args:
X (ndarray (m,n): Data, m examples with n features
y (ndarray (m,)): target values
w (ndarray (n,)): model parameters
b (scalar) : model parameter
lambda_ (scalar): Controls amount of regularization

Returns:
dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w.
dj_db (scalar): The gradient of the cost w.r.t. the parameter b.
"""
m,n = X.shape #(number of examples, number of features)
dj_dw = np.zeros((n,))
dj_db = 0.

for i in range(m):
err = (np.dot(X[i], w) + b) - y[i]
for j in range(n):
dj_dw[j] = dj_dw[j] + err * X[i, j]
dj_db = dj_db + err
dj_dw = dj_dw / m
dj_db = dj_db / m

for j in range(n):
dj_dw[j] = dj_dw[j] + (lambda_/m) * w[j]

return dj_db, dj_dw

逻辑回归的正则化

cost function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
def compute_cost_logistic_reg(X, y, w, b, lambda_ = 1):
"""
Computes the cost over all examples
Args:
Args:
X (ndarray (m,n): Data, m examples with n features
y (ndarray (m,)): target values
w (ndarray (n,)): model parameters
b (scalar) : model parameter
lambda_ (scalar): Controls amount of regularization
Returns:
total_cost (scalar): cost
"""

m,n = X.shape
cost = 0.
for i in range(m):
z_i = np.dot(X[i], w) + b #(n,)(n,)=scalar, see np.dot
f_wb_i = sigmoid(z_i) #scalar
cost += -y[i]*np.log(f_wb_i) - (1-y[i])*np.log(1-f_wb_i) #scalar

cost = cost/m #scalar

reg_cost = 0
for j in range(n):
reg_cost += (w[j]**2) #scalar
reg_cost = (lambda_/(2*m)) * reg_cost #scalar

total_cost = cost + reg_cost #scalar
return total_cost #scalar

Gradient decline:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def compute_gradient_logistic_reg(X, y, w, b, lambda_): 
"""
Computes the gradient for linear regression

Args:
X (ndarray (m,n): Data, m examples with n features
y (ndarray (m,)): target values
w (ndarray (n,)): model parameters
b (scalar) : model parameter
lambda_ (scalar): Controls amount of regularization
Returns
dj_dw (ndarray Shape (n,)): The gradient of the cost w.r.t. the parameters w.
dj_db (scalar) : The gradient of the cost w.r.t. the parameter b.
"""
m,n = X.shape
dj_dw = np.zeros((n,)) #(n,)
dj_db = 0.0 #scalar

for i in range(m):
f_wb_i = sigmoid(np.dot(X[i],w) + b) #(n,)(n,)=scalar
err_i = f_wb_i - y[i] #scalar
for j in range(n):
dj_dw[j] = dj_dw[j] + err_i * X[i,j] #scalar
dj_db = dj_db + err_i
dj_dw = dj_dw/m #(n,)
dj_db = dj_db/m #scalar

for j in range(n):
dj_dw[j] = dj_dw[j] + (lambda_/m) * w[j]

return dj_db, dj_dw