선형회귀 (Linear Regression)
2021. 8. 10. 17:04ㆍ머신러닝
선형회귀는 지도학습(Supervised Learning)이다.
가장 간단한 형태의 단순선형회귀 모델
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
plt.style.use(['seaborn-whitegrid'])
noise = np.random.rand(100)
x = sorted(10 * np.random.rand(100)) + noise
y = 2*x + 3 + noise
plt.scatter(x,y)
plt.show()
scikit learn 을 이용한 선형회귀 분석
X = x[:,np.newaxis]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)
model = LinearRegression()
model.fit(X_train, y_train)
print("선형 회귀 가중치: {}".format(model.coef_))
print("선형 회귀 편향: {}".format(model.intercept_))
선형 회귀 가중치: [2.00862109]
선형 회귀 편향: 3.4807009996258973
print("학습 데이터 점수: {}".format(model.score(X_train, y_train)))
print("평가 데이터 점수: {}".format(model.score(X_test, y_test)))
학습 데이터 점수: 0.9970998987721312
평가 데이터 점수: 0.9975946975084359
predict = model.predict(X_test)
plt.scatter(X, y)
plt.plot(X_test, predict, '--r')
보스턴 주택 가격 데이터
boston = load_boston()
boston_df = pd.DataFrame(boston.data, columns=boston.feature_names)
boston_df['MEDV'] = boston.target
boston_df.head()
X_train, X_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.2)
model = LinearRegression(normalize=True)
model.fit(X_train, y_train)
print('y= ' +str(round(model.intercept_,2)) + ' ')
for i, c in enumerate(model.coef_):
print(str(round(c,2)) + '*x' + str(i))
위와 같이 각 feature에 대한 계수를 구할 수 있다.
from sklearn.metrics import mean_squared_error, r2_score
y_train_predict = model.predict(X_train)
rmse = (np.sqrt(mean_squared_error(y_train, y_train_predict)))
r2 = r2_score(y_train, y_train_predict)
print('RMSE: {}'.format(rmse))
print('R2: {}'.format(r2))
RMSE: 4.602242365338021
R2: 0.7462536181880832
y_test_predict = model.predict(X_test)
rmse = (np.sqrt(mean_squared_error(y_test, y_test_predict)))
r2 = r2_score(y_test, y_test_predict)
print('RMSE: {}'.format(rmse))
print('R2: {}'.format(r2))
RMSE: 5.080409785414186
R2: 0.7054686365434318
def plot_boston_prices(expected, predicted):
plt.figure(figsize=(8,4))
plt.scatter(expected, predicted)
plt.plot([5,50],[5,50],'--r')
plt.xlabel('True price ($1,000s)')
plt.ylabel('Predicted price ($1,000s)')
plt.tight_layout()
predicted = model.predict(X_test)
expected = y_test
plot_boston_prices(expected, predicted)
'머신러닝' 카테고리의 다른 글
신축망(Elastic-Net) (0) | 2021.08.10 |
---|---|
라쏘 회귀(Lasso Regression) (0) | 2021.08.10 |
릿지 회귀(Ridge regression) (0) | 2021.08.10 |
머신러닝 Machine Learning (0) | 2021.08.07 |