2.이진분류문제_2_대학진학
- -
layout: single
title: "jupyter notebook 변환하기!"
categories: coding
tag: [python, blog, jekyll]
toc: true
author_profile: false
대학원진학 이진분류 예측 문제
import warnings
warnings.filterwarnings(action='ignore')
1. scikit-learn 패키지는 머신러닝 교육을 위한 최고의 파이썬 패키지입니다.
scikit-learn를 별칭(alias) sk로 임포트하는 코드를 작성하고 실행하세요.
# 여기에 답안코드를 작성하세요.
import sklearn as sk
2. Pandas는 데이터 분석을 위해 널리 사용되는 파이썬 라이브러리입니다.
Pandas를 사용할 수 있도록 별칭(alias)을 pd로 해서 불러오세요.
# 여기에 답안코드를 작성하세요.
import pandas as pd
3. 모델링을 위해 분석 및 처리할 데이터 파일을 읽어오려고 합니다.
Pandas함수로 데이터 파일을 읽어 데이터프레임 변수명 df에 할당하는 코드를 작성하세요.
path = 'https://raw.githubusercontent.com/khw11044/csv_dataset/master/admission_simple.csv'
df = pd.read_csv(path)
df.head()
GRE | TOEFL | RANK | SOP | LOR | GPA | RESEARCH | ADMIT | |
---|---|---|---|---|---|---|---|---|
0 | 337 | 118 | 4 | 4.5 | 4.5 | 9.65 | 1 | 1 |
1 | 324 | 107 | 4 | 4.0 | 4.5 | 8.87 | 1 | 1 |
2 | 316 | 104 | 3 | 3.0 | 3.5 | 8.00 | 1 | 0 |
3 | 322 | 110 | 3 | 3.5 | 2.5 | 8.67 | 1 | 1 |
4 | 314 | 103 | 2 | 2.0 | 3.0 | 8.21 | 0 | 0 |
데이터 설명
GRE: GRE Scores (out of 340)
TOEFL: TOEFL Scores (out of 120)
RANK: University Rating (out of 5)
SOP: Statement of Purpose Strength (out of 5)
LOR: Letter of Recommendation Strength (out of 5)
GPA: Undergraduate GPA (out of 10)
RESEARCH: Research Experience (either 0 or 1)
ADMIT: Chance of Admit (either 0 or 1)
다음 문항을 풀기 전에 아래 코드를 실행하세요.
# 여기에 답안코드를 작성하세요.
import seaborn as sns
import matplotlib.pyplot as plt
plt.rcParams['font.family'] = 'Malgun Gothic'
3. 각 컬럼별 분포를 알아보려고 합니다
다음 컬럼별 분포도를 그리세요
Seaborn을 활용하세요.
countplot 함수를 사용하세요
첫번째는 admit
두번째는 rank
plt.figure(figsize=(16,4))
plt.subplot(1,5,1)
sns.countplot(x='ADMIT', data=df)
plt.title('ADMIT')
plt.subplot(1,5,2)
sns.countplot(x='RESEARCH', data=df)
plt.title('RESEARCH')
plt.subplot(1,5,3)
sns.countplot(x='LOR', data=df)
plt.title('LOR')
plt.subplot(1,5,4)
sns.countplot(x='SOP', data=df)
plt.title('SOP')
plt.subplot(1,5,5)
sns.countplot(x='RANK', data=df)
plt.title('RANK')
plt.show()
4. rank별 ADMIT의 수를 알아보려고 합니다
rank별 ADMIT 유무에 대한 수를 바그래프로 그리세요
Seaborn을 활용하세요.
countplot 함수를 사용하세요
첫번째는 admit
plt.figure(figsize=(16,8))
plt.subplot(2,2,1)
sns.countplot(x='ADMIT', data=df[df['ADMIT']==1])
plt.title('ADMIT')
plt.subplot(2,2,2)
sns.countplot(x='RANK', data=df[df['ADMIT']==1])
plt.title('rank')
plt.subplot(2,2,3)
sns.countplot(x='ADMIT', data=df[df['ADMIT']==0])
plt.title('UNADMIT')
plt.subplot(2,2,4)
sns.countplot(x='RANK', data=df[df['ADMIT']==0])
plt.title('rank')
plt.tight_layout()
plt.show()
# 3등급 승객들이 보통 더 많이 죽었다, 남성이 보통 더 많이 죽었다, S Embarked가 더 많이 죽었다.
plt.figure(figsize=(16,8))
plt.subplot(2,2,1)
sns.countplot(x='ADMIT', data=df[df['ADMIT']==1])
plt.title('ADMIT')
plt.subplot(2,2,2)
sns.countplot(x='SOP', data=df[df['ADMIT']==1])
plt.title('SOP')
plt.subplot(2,2,3)
sns.countplot(x='ADMIT', data=df[df['ADMIT']==0])
plt.title('UNADMIT')
plt.subplot(2,2,4)
sns.countplot(x='SOP', data=df[df['ADMIT']==0])
plt.title('SOP')
plt.tight_layout()
plt.show()
# 3등급 승객들이 보통 더 많이 죽었다, 남성이 보통 더 많이 죽었다, S Embarked가 더 많이 죽었다.
plt.figure(figsize=(16,8))
plt.subplot(2,2,1)
sns.countplot(x='ADMIT', data=df[df['ADMIT']==1])
plt.title('ADMIT')
plt.subplot(2,2,2)
sns.countplot(x='LOR', data=df[df['ADMIT']==1])
plt.title('LOR')
plt.subplot(2,2,3)
sns.countplot(x='ADMIT', data=df[df['ADMIT']==0])
plt.title('UNADMIT')
plt.subplot(2,2,4)
sns.countplot(x='LOR', data=df[df['ADMIT']==0])
plt.title('LOR')
plt.tight_layout()
plt.show()
# 3등급 승객들이 보통 더 많이 죽었다, 남성이 보통 더 많이 죽었다, S Embarked가 더 많이 죽었다.
plt.figure(figsize=(16,8))
plt.subplot(2,2,1)
sns.countplot(x='ADMIT', data=df[df['ADMIT']==1])
plt.title('ADMIT')
plt.subplot(2,2,2)
sns.countplot(x='RESEARCH', data=df[df['ADMIT']==1])
plt.title('RESEARCH')
plt.subplot(2,2,3)
sns.countplot(x='ADMIT', data=df[df['ADMIT']==0])
plt.title('UNADMIT')
plt.subplot(2,2,4)
sns.countplot(x='RESEARCH', data=df[df['ADMIT']==0])
plt.title('RESEARCH')
plt.tight_layout()
plt.show()
# 3등급 승객들이 보통 더 많이 죽었다, 남성이 보통 더 많이 죽었다, S Embarked가 더 많이 죽었다.
5. admit 유무 별 GRE와 GPA의 분포를 알고 싶습니다
admit에 대한 GRE와 GPA의 분포도를 그리세요
Seaborn을 활용하세요.
histplot 함수를 사용하세요
첫번째는 전체 GRE
두번째는 admit의 GRE
세번째는 Unadmit의 GRE
GPA도 똑같이
plt.figure(figsize=(16,4))
plt.subplot(1,3,1)
sns.histplot(x='GRE', data = df, bins=30, kde = True)
plt.title('GRE')
plt.subplot(1,3,2)
sns.histplot(x='GRE', data = df[df['ADMIT']==1], bins=30, kde = True)
plt.title('admit_GRE')
plt.subplot(1,3,3)
sns.histplot(x='GRE', data = df[df['ADMIT']==0], bins=30, kde = True)
plt.title('Unadmit_GRE')
plt.show()
# 어릴 수록 살아남았다
plt.figure(figsize=(16,4))
plt.subplot(1,3,1)
sns.histplot(x='GPA', data = df, bins=30, kde = True)
plt.title('GPA')
plt.subplot(1,3,2)
sns.histplot(x='GPA', data = df[df['ADMIT']==1], bins=30, kde = True)
plt.title('admit_GPA')
plt.subplot(1,3,3)
sns.histplot(x='GPA', data = df[df['ADMIT']==0], bins=30, kde = True)
plt.title('Unadmit_GPA')
plt.show()
# 어릴 수록 살아남았다
6. 각 컬럼의 상관관계를 알아 보려고 합니다. 상관관계를 통해 필요없는 컬럼을 제거할 수 있습니다.
상관관계 히트맵을 그리세요.
numeric_only를 통해 숫자형 컬럼들만 나타내세요
cmap은 Blues로 하세요
cbar는 보이지 않습니다
소수점은 소수점 3번째 자리 까지 나타내세요
# 여기에 답안코드를 작성하세요.
# 상관관계 시각화
plt.figure(figsize=(6,6))
sns.heatmap(df.corr(numeric_only=True),
annot=True,
cmap='Blues',
cbar=False, # 옆에 칼라 바 제거
square=True,
fmt='.3f', # 소수점
annot_kws={'size':9}
)
plt.show()
# PassengerId, Age, SibSp, Parch 컬럼은 제거한다.
8. 모델링 성능을 제대로 얻기 위해서 결측치 처리는 필수입니다.
아래 가이드를 따라 결측치 처리하세요.
대상 데이터프레임: df
결측치를 확인하는 코드를 작성하세요.
결측치가 있는 행(raw)를 삭제 하세요.
전처리 반영된 결과를 새로운 데이터프레임 변수명 df_na에 저장하세요.
# NaN 값 확인
df.isnull().sum()
GRE 0 TOEFL 0 RANK 0 SOP 0 LOR 0 GPA 0 RESEARCH 0 ADMIT 0 dtype: int64
9. 원-핫 인코딩(One-hot encoding)은 범주형 변수를 1과 0의 이진형 벡터로 변환하기 위하여 사용하는 방법입니다.
원-핫 인코딩으로 아래 조건에 해당하는 컬럼 데이터를 변환하세요.
대상 데이터프레임: df_sel
원-핫 인코딩 대상: 'Pclass', 'Sex', 'Embarked'
활용 함수: pandas의 get_dummies
해당 전처리가 반영된 결과를 데이터프레임 변수 df_preset에 저장해 주세요.
df['RANK'].value_counts()
RANK 3 162 2 126 4 105 5 73 1 34 Name: count, dtype: int64
# 가변수화 대상: RANK
dumm_cols = ['RANK']
# 가변수화
df_preset = pd.get_dummies(df, columns=dumm_cols, drop_first=True, dtype=int)
# 확인
df_preset.head()
GRE | TOEFL | SOP | LOR | GPA | RESEARCH | ADMIT | RANK_2 | RANK_3 | RANK_4 | RANK_5 | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 337 | 118 | 4.5 | 4.5 | 9.65 | 1 | 1 | 0 | 0 | 1 | 0 |
1 | 324 | 107 | 4.0 | 4.5 | 8.87 | 1 | 1 | 0 | 0 | 1 | 0 |
2 | 316 | 104 | 3.0 | 3.5 | 8.00 | 1 | 0 | 0 | 1 | 0 | 0 |
3 | 322 | 110 | 3.5 | 2.5 | 8.67 | 1 | 1 | 0 | 1 | 0 | 0 |
4 | 314 | 103 | 2.0 | 3.0 | 8.21 | 0 | 0 | 1 | 0 | 0 | 0 |
10. 훈련과 검증 각각에 사용할 데이터셋을 분리하려고 합니다.
ADMIT(대학원진학) 컬럼을 label값 y로, 나머지 컬럼을 feature값 X로 할당한 후 훈련데이터셋과 검증데이터셋으로 분리하세요.
대상 데이터프레임: df_preset
훈련과 검증 데이터셋 분리
훈련 데이터셋 label: y_train, 훈련 데이터셋 Feature: X_train
검증 데이터셋 label: y_valid, 검증 데이터셋 Feature: X_valid
훈련 데이터셋과 검증데이터셋 비율은 80:20
random_state: 42
Scikit-learn의 train_test_split 함수를 활용하세요.
스케일링 수행
sklearn.preprocessing의 MinMaxScaler 함수 사용
훈련데이터셋의 Feature는 MinMaxScaler의 fit_transform 함수를 활용하여 X_train 변수로 할당
검증데이터셋의 Feature는 MinMaxScaler의 transform 함수를 활용하여 X_valid 변수로 할당
# 여기에 답안코드를 작성하세요.
from sklearn.model_selection import train_test_split
target = 'ADMIT'
x = df_preset.drop(target, axis=1)
y = df_preset[target]
X_train, X_valid, y_train, y_valid = train_test_split(x,y, test_size=0.2, random_state=42)
print(X_train.shape, X_valid.shape, y_train.shape, y_valid.shape)
(400, 10) (100, 10) (400,) (100,)
from sklearn.preprocessing import MinMaxScaler, StandardScaler, RobustScaler
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_valid = scaler.transform(X_valid)
11. ADMIT(대학원진학)을 예측하는 머신러닝 모델을 만들려고 합니다.
아래 가이드에 따라 다음 모델을 만들고 학습을 진행하세요.
가장 성능이 좋은 모델 이름을 답안11 변수에 저장하세요
- 예. 답안11 = 'KNeighborsClassifier' 혹은 'DecisionTreeClassifier' 혹은 'LogisticRegression' 혹은 'RandomForestClassifier' 등등
# 1단계: 불러오기
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
from sklearn.metrics import confusion_matrix, classification_report, roc_auc_score
models = {
"KNeighborsClassifier": {"model":KNeighborsClassifier(n_neighbors=5)},
"DecisionTreeClassifier": {"model":DecisionTreeClassifier()},
"LogisticRegression": {"model":LogisticRegression()},
"RandomForestClassifier": {"model":RandomForestClassifier()},
"XGBClassifier": {"model":XGBClassifier()},
"LGBMClassifier": {"model":LGBMClassifier(verbose=-1)}
}
from time import perf_counter
# Train모델 학습
for name, model in models.items():
model = model['model']
start = perf_counter()
history = model.fit(X_train, y_train)
# 학습시간과 val_accuracy 저장
duration = perf_counter() - start
duration = round(duration,2)
models[name]['perf'] = duration
y_train_pred = model.predict(X_train)
y_val_pred = model.predict(X_valid)
train_score = round(model.score(X_train, y_train),4)
val_score = round(model.score(X_valid, y_valid),4)
models[name]['train_score'] = train_score
models[name]['val_score'] = val_score
print(f"{name:20} trained in {duration} sec, train_score: {train_score}. val_score: {val_score}")
# Create a DataFrame with the results
models_result = []
for name, v in models.items():
models_result.append([ name, models[name]['val_score'],
models[name]['perf']])
df_results = pd.DataFrame(models_result,
columns = ['model','val_score','Training time (sec)'])
df_results.sort_values(by='val_score', ascending=False, inplace=True)
df_results.reset_index(inplace=True,drop=True)
df_results
KNeighborsClassifier trained in 0.0 sec, train_score: 0.9075. val_score: 0.89 DecisionTreeClassifier trained in 0.0 sec, train_score: 1.0. val_score: 0.81 LogisticRegression trained in 0.0 sec, train_score: 0.8825. val_score: 0.84 RandomForestClassifier trained in 0.11 sec, train_score: 1.0. val_score: 0.9 XGBClassifier trained in 0.04 sec, train_score: 1.0. val_score: 0.85 LGBMClassifier trained in 0.02 sec, train_score: 1.0. val_score: 0.88
model | val_score | Training time (sec) | |
---|---|---|---|
0 | RandomForestClassifier | 0.90 | 0.11 |
1 | KNeighborsClassifier | 0.89 | 0.00 |
2 | LGBMClassifier | 0.88 | 0.02 |
3 | XGBClassifier | 0.85 | 0.04 |
4 | LogisticRegression | 0.84 | 0.00 |
5 | DecisionTreeClassifier | 0.81 | 0.00 |
def check_performance_for_model(df_results):
plt.figure(figsize = (15,5))
sns.barplot(x = 'model', y = 'val_score', data = df_results)
plt.title('ACC (%) on the Test set', fontsize = 15)
plt.ylim(0,1.2)
plt.xticks(rotation=90)
plt.show()
check_performance_for_model(df_results)
답안11='KNeighborsClassifier'
from sklearn.metrics import roc_auc_score, accuracy_score
from sklearn.model_selection import cross_val_score
model = KNeighborsClassifier()
# 3단계: 학습하기
model.fit(X_train, y_train)
# K-Fold CV로 성능을 검증
cv_score = cross_val_score(model, X_train, y_train, cv=10)
print('cv_score :', cv_score)
print('mean cv_score :', cv_score.mean())
# 4단계: 예측하기
y_pred = model.predict(X_valid)
# 5단계 평가하기
print(classification_report(y_valid, y_pred))
print('Acc Score :', accuracy_score(y_valid, y_pred))
print('AUC Score :', roc_auc_score(y_valid, y_pred))
precision recall f1-score support 0 0.95 0.88 0.91 64 1 0.80 0.92 0.86 36 accuracy 0.89 100 macro avg 0.88 0.90 0.88 100 weighted avg 0.90 0.89 0.89 100 Acc Score : 0.89 AUC Score : 0.8958333333333333
12. ADMIT(대학원진학)을 예측하는 딥러닝 모델을 만들려고 합니다.
아래 가이드에 따라 모델링하고 학습을 진행하세요.
Tensoflow framework를 사용하여 딥러닝 모델을 만드세요.
히든레이어(hidden layer) 2개이상으로 모델을 구성하세요.
손실함수는 binary_crossentropy를 사용하세요.
하이퍼파라미터 epochs: 100, batch_size: 16으로 설정해주세요.
각 에포크마다 loss와 metrics 평가하기 위한 데이터로 X_valid, y_valid 사용하세요.
학습정보는 history 변수에 저장해주세요
import tensorflow as tf
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.layers import Dense, Activation, Dropout, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.utils import to_categorical
# 규제를 위해 필요한 함수 불러오기
from tensorflow.keras.regularizers import l1, l2
tf.random.set_seed(1)
nfeatures = X_train.shape[1]
nfeatures
# Sequential 모델 만들기
model = Sequential()
model.add(Dense(32, activation='relu', input_shape=(nfeatures,), kernel_regularizer = l1(0.01)))
model.add(Dense(16, activation='relu', kernel_regularizer = l1(0.01)))
model.add(Dense(1, activation= 'sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy',metrics=['acc'])
# es = EarlyStopping(monitor='val_loss', patience=4, mode='min', verbose=1) # val_loss
history = model.fit(X_train, y_train,
batch_size=16,
epochs=100,
# callbacks=[es],
validation_data=(X_valid, y_valid),
verbose=1).history
Epoch 1/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - acc: 0.5903 - loss: 2.1358 - val_acc: 0.6300 - val_loss: 2.0082 Epoch 2/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.7896 - loss: 1.9416 - val_acc: 0.8000 - val_loss: 1.8115 Epoch 3/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - acc: 0.8187 - loss: 1.7621 - val_acc: 0.8300 - val_loss: 1.6269 Epoch 4/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8153 - loss: 1.5951 - val_acc: 0.8400 - val_loss: 1.4648 Epoch 5/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8099 - loss: 1.4456 - val_acc: 0.8600 - val_loss: 1.3216 Epoch 6/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.7974 - loss: 1.3130 - val_acc: 0.8600 - val_loss: 1.1950 Epoch 7/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.8025 - loss: 1.1953 - val_acc: 0.8800 - val_loss: 1.0821 Epoch 8/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8166 - loss: 1.0906 - val_acc: 0.8800 - val_loss: 0.9829 Epoch 9/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8153 - loss: 0.9996 - val_acc: 0.8900 - val_loss: 0.8990 Epoch 10/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8253 - loss: 0.9208 - val_acc: 0.8900 - val_loss: 0.8282 Epoch 11/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8260 - loss: 0.8526 - val_acc: 0.8700 - val_loss: 0.7696 Epoch 12/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8260 - loss: 0.7946 - val_acc: 0.8600 - val_loss: 0.7209 Epoch 13/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8354 - loss: 0.7486 - val_acc: 0.8600 - val_loss: 0.6841 Epoch 14/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8289 - loss: 0.7130 - val_acc: 0.8600 - val_loss: 0.6568 Epoch 15/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.8278 - loss: 0.6873 - val_acc: 0.8600 - val_loss: 0.6372 Epoch 16/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8283 - loss: 0.6679 - val_acc: 0.8600 - val_loss: 0.6213 Epoch 17/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8291 - loss: 0.6519 - val_acc: 0.8600 - val_loss: 0.6077 Epoch 18/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8294 - loss: 0.6379 - val_acc: 0.8600 - val_loss: 0.5964 Epoch 19/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8294 - loss: 0.6258 - val_acc: 0.8600 - val_loss: 0.5860 Epoch 20/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8294 - loss: 0.6148 - val_acc: 0.8600 - val_loss: 0.5766 Epoch 21/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8315 - loss: 0.6054 - val_acc: 0.8600 - val_loss: 0.5682 Epoch 22/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8363 - loss: 0.5960 - val_acc: 0.8600 - val_loss: 0.5603 Epoch 23/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8372 - loss: 0.5871 - val_acc: 0.8600 - val_loss: 0.5524 Epoch 24/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8372 - loss: 0.5792 - val_acc: 0.8600 - val_loss: 0.5455 Epoch 25/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8372 - loss: 0.5720 - val_acc: 0.8600 - val_loss: 0.5391 Epoch 26/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8372 - loss: 0.5657 - val_acc: 0.8600 - val_loss: 0.5335 Epoch 27/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8372 - loss: 0.5596 - val_acc: 0.8600 - val_loss: 0.5279 Epoch 28/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8415 - loss: 0.5540 - val_acc: 0.8600 - val_loss: 0.5228 Epoch 29/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8421 - loss: 0.5483 - val_acc: 0.8600 - val_loss: 0.5183 Epoch 30/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.8514 - loss: 0.5426 - val_acc: 0.8600 - val_loss: 0.5135 Epoch 31/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.8514 - loss: 0.5376 - val_acc: 0.8600 - val_loss: 0.5094 Epoch 32/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8562 - loss: 0.5330 - val_acc: 0.8600 - val_loss: 0.5058 Epoch 33/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8586 - loss: 0.5286 - val_acc: 0.8700 - val_loss: 0.5022 Epoch 34/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8586 - loss: 0.5244 - val_acc: 0.8700 - val_loss: 0.4989 Epoch 35/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8678 - loss: 0.5204 - val_acc: 0.8700 - val_loss: 0.4955 Epoch 36/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8678 - loss: 0.5165 - val_acc: 0.8700 - val_loss: 0.4926 Epoch 37/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8678 - loss: 0.5128 - val_acc: 0.8700 - val_loss: 0.4894 Epoch 38/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8678 - loss: 0.5094 - val_acc: 0.8700 - val_loss: 0.4866 Epoch 39/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8678 - loss: 0.5060 - val_acc: 0.8800 - val_loss: 0.4839 Epoch 40/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8678 - loss: 0.5025 - val_acc: 0.8900 - val_loss: 0.4809 Epoch 41/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8678 - loss: 0.4995 - val_acc: 0.8900 - val_loss: 0.4785 Epoch 42/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8678 - loss: 0.4963 - val_acc: 0.8900 - val_loss: 0.4757 Epoch 43/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8696 - loss: 0.4935 - val_acc: 0.8900 - val_loss: 0.4732 Epoch 44/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8696 - loss: 0.4907 - val_acc: 0.8900 - val_loss: 0.4709 Epoch 45/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8696 - loss: 0.4879 - val_acc: 0.8900 - val_loss: 0.4686 Epoch 46/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8717 - loss: 0.4852 - val_acc: 0.8900 - val_loss: 0.4662 Epoch 47/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8754 - loss: 0.4826 - val_acc: 0.8900 - val_loss: 0.4642 Epoch 48/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8774 - loss: 0.4799 - val_acc: 0.8900 - val_loss: 0.4620 Epoch 49/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.8774 - loss: 0.4773 - val_acc: 0.8900 - val_loss: 0.4600 Epoch 50/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8774 - loss: 0.4747 - val_acc: 0.8900 - val_loss: 0.4577 Epoch 51/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8774 - loss: 0.4724 - val_acc: 0.8900 - val_loss: 0.4555 Epoch 52/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8774 - loss: 0.4701 - val_acc: 0.8900 - val_loss: 0.4535 Epoch 53/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8808 - loss: 0.4678 - val_acc: 0.8900 - val_loss: 0.4518 Epoch 54/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8808 - loss: 0.4655 - val_acc: 0.8900 - val_loss: 0.4499 Epoch 55/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8808 - loss: 0.4633 - val_acc: 0.8900 - val_loss: 0.4481 Epoch 56/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8808 - loss: 0.4612 - val_acc: 0.8900 - val_loss: 0.4463 Epoch 57/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8808 - loss: 0.4590 - val_acc: 0.8900 - val_loss: 0.4448 Epoch 58/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8808 - loss: 0.4570 - val_acc: 0.8900 - val_loss: 0.4432 Epoch 59/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8808 - loss: 0.4552 - val_acc: 0.8900 - val_loss: 0.4418 Epoch 60/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8808 - loss: 0.4532 - val_acc: 0.8900 - val_loss: 0.4402 Epoch 61/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8808 - loss: 0.4513 - val_acc: 0.8900 - val_loss: 0.4389 Epoch 62/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8808 - loss: 0.4493 - val_acc: 0.8900 - val_loss: 0.4374 Epoch 63/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8808 - loss: 0.4473 - val_acc: 0.8900 - val_loss: 0.4360 Epoch 64/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8808 - loss: 0.4456 - val_acc: 0.8900 - val_loss: 0.4346 Epoch 65/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8808 - loss: 0.4438 - val_acc: 0.8900 - val_loss: 0.4332 Epoch 66/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8808 - loss: 0.4422 - val_acc: 0.8900 - val_loss: 0.4320 Epoch 67/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8808 - loss: 0.4404 - val_acc: 0.8900 - val_loss: 0.4306 Epoch 68/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8808 - loss: 0.4388 - val_acc: 0.8900 - val_loss: 0.4295 Epoch 69/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8808 - loss: 0.4371 - val_acc: 0.8900 - val_loss: 0.4281 Epoch 70/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8808 - loss: 0.4354 - val_acc: 0.8800 - val_loss: 0.4268 Epoch 71/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8821 - loss: 0.4338 - val_acc: 0.8800 - val_loss: 0.4256 Epoch 72/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8821 - loss: 0.4321 - val_acc: 0.8800 - val_loss: 0.4243 Epoch 73/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8821 - loss: 0.4304 - val_acc: 0.8800 - val_loss: 0.4230 Epoch 74/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8818 - loss: 0.4288 - val_acc: 0.8800 - val_loss: 0.4218 Epoch 75/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8818 - loss: 0.4272 - val_acc: 0.8800 - val_loss: 0.4206 Epoch 76/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8818 - loss: 0.4256 - val_acc: 0.8700 - val_loss: 0.4195 Epoch 77/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8818 - loss: 0.4241 - val_acc: 0.8700 - val_loss: 0.4184 Epoch 78/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8818 - loss: 0.4225 - val_acc: 0.8600 - val_loss: 0.4173 Epoch 79/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8818 - loss: 0.4209 - val_acc: 0.8600 - val_loss: 0.4162 Epoch 80/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8818 - loss: 0.4194 - val_acc: 0.8600 - val_loss: 0.4152 Epoch 81/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8818 - loss: 0.4178 - val_acc: 0.8700 - val_loss: 0.4140 Epoch 82/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8794 - loss: 0.4164 - val_acc: 0.8600 - val_loss: 0.4131 Epoch 83/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8794 - loss: 0.4147 - val_acc: 0.8700 - val_loss: 0.4118 Epoch 84/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8794 - loss: 0.4132 - val_acc: 0.8700 - val_loss: 0.4109 Epoch 85/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8836 - loss: 0.4117 - val_acc: 0.8700 - val_loss: 0.4098 Epoch 86/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8836 - loss: 0.4104 - val_acc: 0.8700 - val_loss: 0.4091 Epoch 87/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8836 - loss: 0.4091 - val_acc: 0.8700 - val_loss: 0.4081 Epoch 88/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8836 - loss: 0.4079 - val_acc: 0.8700 - val_loss: 0.4075 Epoch 89/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8836 - loss: 0.4065 - val_acc: 0.8700 - val_loss: 0.4063 Epoch 90/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8836 - loss: 0.4053 - val_acc: 0.8700 - val_loss: 0.4057 Epoch 91/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8836 - loss: 0.4039 - val_acc: 0.8700 - val_loss: 0.4048 Epoch 92/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8836 - loss: 0.4028 - val_acc: 0.8700 - val_loss: 0.4041 Epoch 93/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8836 - loss: 0.4015 - val_acc: 0.8700 - val_loss: 0.4031 Epoch 94/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8836 - loss: 0.4003 - val_acc: 0.8700 - val_loss: 0.4026 Epoch 95/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8832 - loss: 0.3991 - val_acc: 0.8700 - val_loss: 0.4017 Epoch 96/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8855 - loss: 0.3981 - val_acc: 0.8700 - val_loss: 0.4012 Epoch 97/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8855 - loss: 0.3969 - val_acc: 0.8700 - val_loss: 0.4003 Epoch 98/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8855 - loss: 0.3959 - val_acc: 0.8700 - val_loss: 0.3998 Epoch 99/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8855 - loss: 0.3947 - val_acc: 0.8700 - val_loss: 0.3991 Epoch 100/100 [1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.8855 - loss: 0.3936 - val_acc: 0.8700 - val_loss: 0.3985
# 함수로 만들어서 사용합시다.
def dl_history_plot(history):
plt.figure(figsize=(16,4))
plt.subplot(1,2,1)
plt.plot(history['loss'], label='loss', marker = '.')
plt.plot(history['val_loss'], label='val_loss', marker = '.')
plt.ylabel('Loss')
plt.xlabel('Epochs')
plt.legend()
plt.grid()
plt.subplot(1,2,2)
plt.plot(history['acc'], label='acc', marker = '.')
plt.plot(history['val_acc'], label='val_acc', marker = '.')
plt.ylabel('ACC')
plt.xlabel('Epochs')
plt.legend()
plt.grid()
plt.show()
dl_history_plot(history)
import numpy as np
from sklearn.metrics import roc_auc_score, accuracy_score
pred = model.predict(X_valid)
pred = np.where(pred >= .5, 1, 0)
print(classification_report(y_valid, pred))
print('Acc Score :', accuracy_score(y_valid, pred))
print('AUC Score :', roc_auc_score(y_valid, pred))
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step precision recall f1-score support 0 0.93 0.86 0.89 64 1 0.78 0.89 0.83 36 accuracy 0.87 100 macro avg 0.86 0.87 0.86 100 weighted avg 0.88 0.87 0.87 100 Acc Score : 0.87 AUC Score : 0.8741319444444444
# 모듈 불러오기
from sklearn.metrics import precision_score
# 성능 평가
print('정밀도(Precision):', precision_score(y_valid, y_pred))
print('정밀도(Precision):', precision_score(y_valid, y_pred, average='binary')) # 2진분류
print('정밀도(Precision):', precision_score(y_valid, y_pred, average=None))
print('정밀도(Precision):', precision_score(y_valid, y_pred, average='macro'))
print('정밀도(Precision):', precision_score(y_valid, y_pred, average='weighted'))
정밀도(Precision): 0.8048780487804879 정밀도(Precision): 0.8048780487804879 정밀도(Precision): [0.94915254 0.80487805] 정밀도(Precision): 0.8770152955766846 정밀도(Precision): 0.8972137246796197
# 모듈 불러오기
from sklearn.metrics import recall_score
# 성능 평가
print('각각 0과 1에 대한 재현율(Recall):', recall_score(y_valid, y_pred, average=None))
각각 0과 1에 대한 재현율(Recall): [0.875 0.91666667]
# 모듈 불러오기
from sklearn.metrics import f1_score
# 성능 평가
print('각각 0과 1에 대한 F1 score:', f1_score(y_valid, y_pred, average=None))
각각 0과 1에 대한 F1 score: [0.91056911 0.85714286]
'Machine Learning' 카테고리의 다른 글
3.다중분류문제_1_아이리스 (0) | 2024.06.02 |
---|---|
2.이진분류문제_3_악성사이트 (0) | 2024.06.02 |
2.이진분류문제_1_타이타닉 (0) | 2024.06.02 |
1.회귀문제_4_네비게이션 (0) | 2024.06.02 |
1.회귀문제_3_따릉이 (0) | 2024.06.02 |
소중한 공감 감사합니다