回归分析

EaborH

什么是回归分析

回归分析:根据数据,确定两个或两个以上变量之间的相互依赖的定量关系
函数表达式

线性回归

线性回归:回归分析中,变量和应变量存在线性关系
函数表达式:y=ax+b
eg : 距离S=速度

回归问题求解

如何找到最合适的? 假设 为变量, 为对应的结果, 为模型输出的结果 要使 尽可能接近

梯度下降法 寻找极小值的一种方法。通过向函数上当前点对应梯度(或者是近似梯度)的反方向的规定步长距离点经行迭代搜索,直到在极小点收敛。

最后会逐渐接近极小值点

调用Scikit-learn求解线性回归问题

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import pandas as pd  
import matplotlib
matplotlib.use('TkAgg')
from matplotlib import pyplot as plt
import sklearn
from sklearn.linear_model import LinearRegression
import numpy as np
from sklearn.metrics import mean_squared_error,r2_score
data = pd.read_csv('1.csv')
#print(type(data),data.shape)
#data 赋值
x = data.loc[:,'x']
y = data.loc[:,'y']
# 展示图形
#plt.figure()
#plt.scatter(x,y)
#plt.show()
lr_model = LinearRegression()
x = np.array(x)
x = x.reshape(-1,1)
y = np.array(y)
y = y.reshape(-1,1)
lr_model.fit(x,y)
y_pred = lr_model.predict(x)
print(y_pred)
y_3 = lr_model.predict([[3.5]]) #预测x = 3.5时y的值
print(y_3)

#a/b print
a = lr_model.coef_
b = lr_model.intercept_
print(a,b)

print("-------------------------------------------")
MSE = mean_squared_error(y,y_pred)
R2 = r2_score(y,y_pred)
print(MSE,R2)
plt.plot(y,y_pred)
plt.show()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
"""  

数据说明:
Avg.AreaIncome:区域平均收入
Avg.Area House Age:平均房屋年龄
Avg.Area Number of Rooms:平均房间数量
Area Population:区域人口
size:尺寸
Price:价格

"""

#load the data
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
matplotlib.use('TkAgg')
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error,r2_score

data = pd.read_csv('usa_house_price.csv')
data.head() #数据快速预览
fig = plt.figure(figsize = (10,10))
fig1 = plt.subplot(231)
plt.scatter(data['AreaIncome'], data['Price'])
# 与上者等价 plt.scatter(data.loc[:,'Avg.AreaIncome'], data.loc[:,'Price'])plt.title('Price VS AreaIncome')
plt.show()

#define x and y
x = data.loc[:,'size']
y = data.loc[:,'Price']
x = np.array(x).reshape(-1,1)

#set up the linear regression model
LR1 = LinearRegression()
#train the model
LR1.fit(x,y)

#calculate the price vs size
#均方误差 (MSE) 和 决定系数 (R² Score)y_pred_1 = LR1.predict(x)
print(y_pred_1)
mean_squared_error_1 = mean_squared_error(y,y_pred_1)
r2_score_1 = r2_score(y,y_pred_1)
print(mean_squared_error_1,r2_score_1)
fig2 = plt.figure(figsize=(8,5))
plt.scatter(x,y)
plt.plot(x,y_pred_1,'r')
plt.show()

#define x_multi
x_multi = data.drop(['Price'],axis=1)

#set up 2nd linear model
LR_multi = LinearRegression()

#train the model
LR_multi.fit(x_multi,y)

#make prediction
y_pred_multi = LR_multi.predict(x_multi)
print(y_pred_multi)
mean_squared_error_multi = mean_squared_error(y,y_pred_multi)
r2_score_multi = r2_score(y,y_pred_multi)
print(mean_squared_error_multi,r2_score_multi)
fig3 = plt.figure(figsize=(8,5))
plt.scatter(y,y_pred_multi)
plt.show()

x_text = [65000,5,5,30000,200]
x_text = np.array(x_text).reshape(-1,1)
print(x_text)
y_text_pred = LR_multi.predict(x_text)
print(y_text_pred)
  • Title: 回归分析
  • Author: EaborH
  • Created at : 2025-03-09 00:00:00
  • Updated at : 2025-03-18 17:48:26
  • Link: https://eabor.xyz/2025/03/09/回归分析/
  • License: This work is licensed under CC BY-NC-SA 4.0.