2.이진분류문제_1_타이타닉
- -
layout: single
title: "jupyter notebook 변환하기!"
categories: coding
tag: [python, blog, jekyll]
toc: true
author_profile: false
타이타닉 생존 이진분류 예측 문제
import warnings
warnings.filterwarnings(action='ignore')
1. scikit-learn 패키지는 머신러닝 교육을 위한 최고의 파이썬 패키지입니다.
scikit-learn를 별칭(alias) sk로 임포트하는 코드를 작성하고 실행하세요.
# 여기에 답안코드를 작성하세요.
import sklearn as sk
2. Pandas는 데이터 분석을 위해 널리 사용되는 파이썬 라이브러리입니다.
Pandas를 사용할 수 있도록 별칭(alias)을 pd로 해서 불러오세요.
# 여기에 답안코드를 작성하세요.
import pandas as pd
3. 모델링을 위해 분석 및 처리할 데이터 파일을 읽어오려고 합니다.
Pandas함수로 데이터 파일을 읽어 데이터프레임 변수명 df에 할당하는 코드를 작성하세요.
path = 'https://raw.githubusercontent.com/khw11044/csv_dataset/master/titanic.0.csv'
df = pd.read_csv(path)
df.head()
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
다음 문항을 풀기 전에 아래 코드를 실행하세요.
# 여기에 답안코드를 작성하세요.
import seaborn as sns
import matplotlib.pyplot as plt
plt.rcParams['font.family'] = 'Malgun Gothic'
3. 각 컬럼별 분포를 알아보려고 합니다
다음 컬럼별 분포도를 그리세요
Seaborn을 활용하세요.
countplot 함수를 사용하세요
첫번째는 Survived
두번째는 Pclass
세번째는 Sex
네번째는 Embarked
plt.figure(figsize=(16,4))
plt.subplot(1,4,1)
sns.countplot(x='Survived', data=df)
plt.title('Survived')
plt.subplot(1,4,2)
sns.countplot(x='Pclass', data=df)
plt.title('Pclass')
plt.subplot(1,4,3)
sns.countplot(x='Sex', data=df)
plt.title('Sex')
plt.subplot(1,4,4)
sns.countplot(x='Embarked', data=df)
plt.title('Embarked')
plt.show()
4. 각 컬럼별 생존자의 수를 알아보려고 합니다
다음 컬럼별 생존자 유무에 대한 수를 바그래프로 그리세요
Seaborn을 활용하세요.
countplot 함수를 사용하세요
첫번째는 Survived
두번째는 Pclass
세번째는 Sex
네번째는 Embarked 에 대한 생존자 수를 그래프로 나타내세요
plt.figure(figsize=(16,8))
plt.subplot(2,4,1)
sns.countplot(x='Survived', data=df[df['Survived']==1])
plt.title('Survived')
plt.subplot(2,4,2)
sns.countplot(x='Pclass', data=df[df['Survived']==1])
plt.title('Pclass')
plt.subplot(2,4,3)
sns.countplot(x='Sex', data=df[df['Survived']==1])
plt.title('Sex')
plt.subplot(2,4,4)
sns.countplot(x='Embarked', data=df[df['Survived']==1])
plt.title('Embarked')
plt.subplot(2,4,5)
sns.countplot(x='Survived', data=df[df['Survived']==0])
plt.title('UNSurvived')
plt.subplot(2,4,6)
sns.countplot(x='Pclass', data=df[df['Survived']==0])
plt.title('Pclass')
plt.subplot(2,4,7)
sns.countplot(x='Sex', data=df[df['Survived']==0])
plt.title('Sex')
plt.subplot(2,4,8)
sns.countplot(x='Embarked', data=df[df['Survived']==0])
plt.title('Embarked')
plt.tight_layout()
plt.show()
# 3등급 승객들이 보통 더 많이 죽었다, 남성이 보통 더 많이 죽었다, S Embarked가 더 많이 죽었다.
5. 생존 유무 별 나이의 분포를 알고 싶습니다
생존에 대한 나이 분포도를 그리세요
Seaborn을 활용하세요.
histplot 함수를 사용하세요
첫번째는 전체 Age
두번째는 산 사람의 Age
세번째는 죽은 사람의 Age
plt.figure(figsize=(16,4))
plt.subplot(1,3,1)
sns.histplot(x='Age', data = df, bins=30, kde = True)
plt.title('Age')
plt.subplot(1,3,2)
sns.histplot(x='Age', data = df[df['Survived']==1], bins=30, kde = True)
plt.title('Survived_Age')
plt.subplot(1,3,3)
sns.histplot(x='Age', data = df[df['Survived']==0], bins=30, kde = True)
plt.title('NonSurvived_Age')
plt.show()
# 어릴 수록 살아남았다
6. 각 컬럼의 상관관계를 알아 보려고 합니다. 상관관계를 통해 필요없는 컬럼을 제거할 수 있습니다.
상관관계 히트맵을 그리세요.
numeric_only를 통해 숫자형 컬럼들만 나타내세요
cmap은 Blues로 하세요
cbar는 보이지 않습니다
소수점은 소수점 3번째 자리 까지 나타내세요
# 여기에 답안코드를 작성하세요.
selected = ['PassengerId', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked', 'Survived']
df_seleted = df[selected]
# 상관관계 시각화
plt.figure(figsize=(6,6))
sns.heatmap(df_seleted.corr(numeric_only=True),
annot=True,
cmap='Blues',
cbar=False, # 옆에 칼라 바 제거
square=True,
fmt='.3f', # 소수점
annot_kws={'size':9}
)
plt.show()
# PassengerId, Age, SibSp, Parch 컬럼은 제거한다.
7. 모델링 성능을 제대로 얻기 위해서 불필요한 변수는 삭제해야 합니다.
아래 가이드를 따라 불필요 데이터를 삭제 처리하세요.
대상 데이터프레임: df
'PassengerId', 'Age', 'SibSp' 3개 컬럼을 삭제하세요.
전처리 반영된 결과를 새로운 데이터프레임 변수명 df_del에 저장하세요.
# 변수 제거
drop_cols = ['PassengerId', 'Age', 'SibSp']
df_del = df.drop(drop_cols, axis=1)
# 확인
df_del.head()
Survived | Pclass | Name | Sex | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|
0 | 0 | 3 | Braund, Mr. Owen Harris | male | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 1 | 3 | Heikkinen, Miss. Laina | female | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 0 | 113803 | 53.1000 | C123 | S |
4 | 0 | 3 | Allen, Mr. William Henry | male | 0 | 373450 | 8.0500 | NaN | S |
8. 모델링 성능을 제대로 얻기 위해서 결측치 처리는 필수입니다.
아래 가이드를 따라 결측치 처리하세요.
대상 데이터프레임: df_del
결측치를 확인하는 코드를 작성하세요.
결측치가 있는 행(raw)를 삭제 하세요.
전처리 반영된 결과를 새로운 데이터프레임 변수명 df_na에 저장하세요.
# NaN 값 확인
df_del.isnull().sum()
Survived 0 Pclass 0 Name 0 Sex 0 Parch 0 Ticket 0 Fare 0 Cabin 687 Embarked 2 dtype: int64
# Embarked 결측치 행 제거
df_na = df_del.dropna(subset=['Embarked'], axis=0) # 행은 axis=0
# Cabin 결측치 열 통으로 제거
df_na = df_na.drop('Cabin', axis=1) # 행은 axis=0
df_na.isnull().sum()
Survived 0 Pclass 0 Name 0 Sex 0 Parch 0 Ticket 0 Fare 0 Embarked 0 dtype: int64
selected = ['Pclass', 'Sex', 'Fare', 'Embarked', 'Survived']
df_sel = df_na[selected]
df_sel.head()
Pclass | Sex | Fare | Embarked | Survived | |
---|---|---|---|---|---|
0 | 3 | male | 7.2500 | S | 0 |
1 | 1 | female | 71.2833 | C | 1 |
2 | 3 | female | 7.9250 | S | 1 |
3 | 1 | female | 53.1000 | S | 1 |
4 | 3 | male | 8.0500 | S | 0 |
9. 원-핫 인코딩(One-hot encoding)은 범주형 변수를 1과 0의 이진형 벡터로 변환하기 위하여 사용하는 방법입니다.
원-핫 인코딩으로 아래 조건에 해당하는 컬럼 데이터를 변환하세요.
대상 데이터프레임: df_sel
원-핫 인코딩 대상: 'Pclass', 'Sex', 'Embarked'
활용 함수: pandas의 get_dummies
해당 전처리가 반영된 결과를 데이터프레임 변수 df_preset에 저장해 주세요.
# 가변수화 대상: Pclass, Sex, Embarked
dumm_cols = ['Pclass', 'Sex', 'Embarked']
# 가변수화
df_preset = pd.get_dummies(df_sel, columns=dumm_cols, drop_first=True, dtype=int)
# 확인
df_preset.head()
Fare | Survived | Pclass_2 | Pclass_3 | Sex_male | Embarked_Q | Embarked_S | |
---|---|---|---|---|---|---|---|
0 | 7.2500 | 0 | 0 | 1 | 1 | 0 | 1 |
1 | 71.2833 | 1 | 0 | 0 | 0 | 0 | 0 |
2 | 7.9250 | 1 | 0 | 1 | 0 | 0 | 1 |
3 | 53.1000 | 1 | 0 | 0 | 0 | 0 | 1 |
4 | 8.0500 | 0 | 0 | 1 | 1 | 0 | 1 |
10. 훈련과 검증 각각에 사용할 데이터셋을 분리하려고 합니다.
Survived(생존) 컬럼을 label값 y로, 나머지 컬럼을 feature값 X로 할당한 후 훈련데이터셋과 검증데이터셋으로 분리하세요.
대상 데이터프레임: df
훈련과 검증 데이터셋 분리
훈련 데이터셋 label: y_train, 훈련 데이터셋 Feature: X_train
검증 데이터셋 label: y_valid, 검증 데이터셋 Feature: X_valid
훈련 데이터셋과 검증데이터셋 비율은 80:20
random_state: 42
Scikit-learn의 train_test_split 함수를 활용하세요.
스케일링 수행
sklearn.preprocessing의 MinMaxScaler 함수 사용
훈련데이터셋의 Feature는 MinMaxScaler의 fit_transform 함수를 활용하여 X_train 변수로 할당
검증데이터셋의 Feature는 MinMaxScaler의 transform 함수를 활용하여 X_valid 변수로 할당
# 여기에 답안코드를 작성하세요.
from sklearn.model_selection import train_test_split
target = 'Survived'
x = df_preset.drop(target, axis=1)
y = df_preset[target]
X_train, X_valid, y_train, y_valid = train_test_split(x,y, test_size=0.2, random_state=42)
print(X_train.shape, X_valid.shape, y_train.shape, y_valid.shape)
(711, 6) (178, 6) (711,) (178,)
from sklearn.preprocessing import MinMaxScaler, StandardScaler, RobustScaler
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_valid = scaler.transform(X_valid)
11. Survived(생존)을 예측하는 머신러닝 모델을 만들려고 합니다.
아래 가이드에 따라 다음 모델을 만들고 학습을 진행하세요.
가장 성능이 좋은 모델 이름을 답안11 변수에 저장하세요
- 예. 답안11 = 'KNeighborsClassifier' 혹은 'DecisionTreeClassifier' 혹은 'LogisticRegression' 혹은 'RandomForestClassifier' 등등
# 1단계: 불러오기
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
from sklearn.metrics import confusion_matrix, classification_report, roc_auc_score
models = {
"KNeighborsClassifier": {"model":KNeighborsClassifier(n_neighbors=5)},
"DecisionTreeClassifier": {"model":DecisionTreeClassifier()},
"LogisticRegression": {"model":LogisticRegression()},
"RandomForestClassifier": {"model":RandomForestClassifier()},
"XGBClassifier": {"model":XGBClassifier()},
"LGBMClassifier": {"model":LGBMClassifier(verbose=-1)}
}
from time import perf_counter
# Train모델 학습
for name, model in models.items():
model = model['model']
start = perf_counter()
history = model.fit(X_train, y_train)
# 학습시간과 val_accuracy 저장
duration = perf_counter() - start
duration = round(duration,2)
models[name]['perf'] = duration
y_train_pred = model.predict(X_train)
y_val_pred = model.predict(X_valid)
train_score = round(model.score(X_train, y_train),4)
val_score = round(model.score(X_valid, y_valid),4)
models[name]['train_score'] = train_score
models[name]['val_score'] = val_score
print(f"{name:20} trained in {duration} sec, train_score: {train_score}. val_score: {val_score}")
# Create a DataFrame with the results
models_result = []
for name, v in models.items():
models_result.append([ name, models[name]['val_score'],
models[name]['perf']])
df_results = pd.DataFrame(models_result,
columns = ['model','val_score','Training time (sec)'])
df_results.sort_values(by='val_score', ascending=False, inplace=True)
df_results.reset_index(inplace=True,drop=True)
df_results
KNeighborsClassifier trained in 0.0 sec, train_score: 0.8467. val_score: 0.8034 DecisionTreeClassifier trained in 0.0 sec, train_score: 0.9156. val_score: 0.7921 LogisticRegression trained in 0.0 sec, train_score: 0.7764. val_score: 0.7809 RandomForestClassifier trained in 0.17 sec, train_score: 0.9156. val_score: 0.8034 XGBClassifier trained in 0.06 sec, train_score: 0.9086. val_score: 0.8034 LGBMClassifier trained in 0.04 sec, train_score: 0.8833. val_score: 0.7584
model | val_score | Training time (sec) | |
---|---|---|---|
0 | KNeighborsClassifier | 0.8034 | 0.00 |
1 | RandomForestClassifier | 0.8034 | 0.17 |
2 | XGBClassifier | 0.8034 | 0.06 |
3 | DecisionTreeClassifier | 0.7921 | 0.00 |
4 | LogisticRegression | 0.7809 | 0.00 |
5 | LGBMClassifier | 0.7584 | 0.04 |
def check_performance_for_model(df_results):
plt.figure(figsize = (15,5))
sns.barplot(x = 'model', y = 'val_score', data = df_results)
plt.title('ACC (%) on the Test set', fontsize = 15)
plt.ylim(0,1.2)
plt.xticks(rotation=90)
plt.show()
check_performance_for_model(df_results)
답안11='RandomForestClassifier'
from sklearn.metrics import roc_auc_score, accuracy_score
from sklearn.model_selection import cross_val_score
model = RandomForestClassifier()
# 3단계: 학습하기
model.fit(X_train, y_train)
# K-Fold CV로 성능을 검증
cv_score = cross_val_score(model, X_train, y_train, cv=10)
print('cv_score :', cv_score)
print('mean cv_score :', cv_score.mean())
# 4단계: 예측하기
y_pred = model.predict(X_valid)
# 5단계 평가하기
print(classification_report(y_valid, y_pred))
print('Acc Score :', accuracy_score(y_valid, y_pred))
print('AUC Score :', roc_auc_score(y_valid, y_pred))
cv_score : [0.76388889 0.76056338 0.78873239 0.87323944 0.78873239 0.83098592 0.85915493 0.8028169 0.8028169 0.78873239] mean cv_score : 0.8059663536776214 precision recall f1-score support 0 0.84 0.84 0.84 109 1 0.75 0.74 0.74 69 accuracy 0.80 178 macro avg 0.79 0.79 0.79 178 weighted avg 0.80 0.80 0.80 178 Acc Score : 0.8033707865168539 AUC Score : 0.7915835660151576
# 데이터프레임 만들기
perf_dic = {'feature':list(x),
'importance': model.feature_importances_}
df = pd.DataFrame(perf_dic)
df.sort_values(by='importance', ascending=True, inplace=True)
# 시각화
plt.figure(figsize=(5, 5))
plt.barh(df['feature'], df['importance'])
plt.show()
12. Survived(생존)을 예측하는 딥러닝 모델을 만들려고 합니다.
아래 가이드에 따라 모델링하고 학습을 진행하세요.
Tensoflow framework를 사용하여 딥러닝 모델을 만드세요.
히든레이어(hidden layer) 2개이상으로 모델을 구성하세요.
손실함수는 binary_crossentropy를 사용하세요.
하이퍼파라미터 epochs: 100, batch_size: 16으로 설정해주세요.
각 에포크마다 loss와 metrics 평가하기 위한 데이터로 X_valid, y_valid 사용하세요.
학습정보는 history 변수에 저장해주세요
import tensorflow as tf
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.layers import Dense, Activation, Dropout, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.utils import to_categorical
# 규제를 위해 필요한 함수 불러오기
from tensorflow.keras.regularizers import l1, l2
tf.random.set_seed(1)
nfeatures = X_train.shape[1]
nfeatures
# Sequential 모델 만들기
model = Sequential()
model.add(Dense(32, activation='relu', input_shape=(nfeatures,), kernel_regularizer = l1(0.01)))
model.add(Dense(16, activation='relu', kernel_regularizer = l1(0.01)))
model.add(Dense(1, activation= 'sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy',metrics=['acc'])
# es = EarlyStopping(monitor='val_loss', patience=4, mode='min', verbose=1) # val_loss
history = model.fit(X_train, y_train,
batch_size=16,
epochs=100,
# callbacks=[es],
validation_data=(X_valid, y_valid),
verbose=1).history
Epoch 1/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - acc: 0.6283 - loss: 1.8807 - val_acc: 0.6798 - val_loss: 1.6821 Epoch 2/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.6472 - loss: 1.6094 - val_acc: 0.7472 - val_loss: 1.4374 Epoch 3/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.6921 - loss: 1.3801 - val_acc: 0.7978 - val_loss: 1.2279 Epoch 4/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7472 - loss: 1.1841 - val_acc: 0.8034 - val_loss: 1.0519 Epoch 5/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7472 - loss: 1.0212 - val_acc: 0.8034 - val_loss: 0.9069 Epoch 6/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7472 - loss: 0.8897 - val_acc: 0.8034 - val_loss: 0.7943 Epoch 7/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7472 - loss: 0.7905 - val_acc: 0.8034 - val_loss: 0.7116 Epoch 8/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7449 - loss: 0.7198 - val_acc: 0.8034 - val_loss: 0.6636 Epoch 9/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7609 - loss: 0.6821 - val_acc: 0.8034 - val_loss: 0.6387 Epoch 10/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7837 - loss: 0.6635 - val_acc: 0.8034 - val_loss: 0.6238 Epoch 11/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7601 - loss: 0.6525 - val_acc: 0.7978 - val_loss: 0.6126 Epoch 12/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.6446 - val_acc: 0.7978 - val_loss: 0.6045 Epoch 13/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.6387 - val_acc: 0.7978 - val_loss: 0.5975 Epoch 14/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.6334 - val_acc: 0.7978 - val_loss: 0.5908 Epoch 15/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.6281 - val_acc: 0.7978 - val_loss: 0.5850 Epoch 16/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.6235 - val_acc: 0.7978 - val_loss: 0.5797 Epoch 17/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.6195 - val_acc: 0.7978 - val_loss: 0.5756 Epoch 18/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.7650 - loss: 0.6163 - val_acc: 0.7978 - val_loss: 0.5719 Epoch 19/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.6133 - val_acc: 0.7978 - val_loss: 0.5689 Epoch 20/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.6108 - val_acc: 0.7978 - val_loss: 0.5661 Epoch 21/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.6086 - val_acc: 0.7978 - val_loss: 0.5637 Epoch 22/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.6066 - val_acc: 0.7978 - val_loss: 0.5616 Epoch 23/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.6051 - val_acc: 0.7978 - val_loss: 0.5600 Epoch 24/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.6036 - val_acc: 0.7978 - val_loss: 0.5584 Epoch 25/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.6023 - val_acc: 0.7978 - val_loss: 0.5572 Epoch 26/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.6010 - val_acc: 0.7978 - val_loss: 0.5558 Epoch 27/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5998 - val_acc: 0.7978 - val_loss: 0.5546 Epoch 28/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5986 - val_acc: 0.7978 - val_loss: 0.5534 Epoch 29/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5975 - val_acc: 0.7978 - val_loss: 0.5524 Epoch 30/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5963 - val_acc: 0.7978 - val_loss: 0.5513 Epoch 31/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5952 - val_acc: 0.7978 - val_loss: 0.5502 Epoch 32/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5940 - val_acc: 0.7978 - val_loss: 0.5491 Epoch 33/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5929 - val_acc: 0.7978 - val_loss: 0.5481 Epoch 34/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5917 - val_acc: 0.7978 - val_loss: 0.5470 Epoch 35/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5906 - val_acc: 0.7978 - val_loss: 0.5460 Epoch 36/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5896 - val_acc: 0.7978 - val_loss: 0.5448 Epoch 37/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5886 - val_acc: 0.7978 - val_loss: 0.5437 Epoch 38/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5874 - val_acc: 0.7978 - val_loss: 0.5426 Epoch 39/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5863 - val_acc: 0.7978 - val_loss: 0.5416 Epoch 40/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5852 - val_acc: 0.7978 - val_loss: 0.5404 Epoch 41/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5841 - val_acc: 0.7978 - val_loss: 0.5393 Epoch 42/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5829 - val_acc: 0.7978 - val_loss: 0.5383 Epoch 43/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5818 - val_acc: 0.7978 - val_loss: 0.5376 Epoch 44/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5809 - val_acc: 0.7978 - val_loss: 0.5370 Epoch 45/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5801 - val_acc: 0.7978 - val_loss: 0.5363 Epoch 46/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5792 - val_acc: 0.7978 - val_loss: 0.5357 Epoch 47/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5784 - val_acc: 0.7978 - val_loss: 0.5352 Epoch 48/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5776 - val_acc: 0.7978 - val_loss: 0.5346 Epoch 49/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5768 - val_acc: 0.7978 - val_loss: 0.5341 Epoch 50/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5760 - val_acc: 0.7978 - val_loss: 0.5335 Epoch 51/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5752 - val_acc: 0.7978 - val_loss: 0.5330 Epoch 52/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5745 - val_acc: 0.7978 - val_loss: 0.5325 Epoch 53/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5737 - val_acc: 0.7978 - val_loss: 0.5320 Epoch 54/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5729 - val_acc: 0.7978 - val_loss: 0.5315 Epoch 55/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5721 - val_acc: 0.7978 - val_loss: 0.5310 Epoch 56/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5713 - val_acc: 0.7978 - val_loss: 0.5304 Epoch 57/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5706 - val_acc: 0.7978 - val_loss: 0.5300 Epoch 58/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5698 - val_acc: 0.7978 - val_loss: 0.5294 Epoch 59/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5690 - val_acc: 0.7978 - val_loss: 0.5289 Epoch 60/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5684 - val_acc: 0.7978 - val_loss: 0.5284 Epoch 61/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5678 - val_acc: 0.7978 - val_loss: 0.5282 Epoch 62/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7650 - loss: 0.5673 - val_acc: 0.7978 - val_loss: 0.5280 Epoch 63/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7640 - loss: 0.5668 - val_acc: 0.7978 - val_loss: 0.5278 Epoch 64/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7640 - loss: 0.5663 - val_acc: 0.7978 - val_loss: 0.5275 Epoch 65/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7623 - loss: 0.5657 - val_acc: 0.8034 - val_loss: 0.5273 Epoch 66/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7607 - loss: 0.5652 - val_acc: 0.8034 - val_loss: 0.5271 Epoch 67/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7766 - loss: 0.5648 - val_acc: 0.8034 - val_loss: 0.5269 Epoch 68/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7783 - loss: 0.5644 - val_acc: 0.8034 - val_loss: 0.5268 Epoch 69/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7791 - loss: 0.5640 - val_acc: 0.8034 - val_loss: 0.5267 Epoch 70/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.7837 - loss: 0.5636 - val_acc: 0.8034 - val_loss: 0.5265 Epoch 71/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7852 - loss: 0.5632 - val_acc: 0.8034 - val_loss: 0.5264 Epoch 72/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7856 - loss: 0.5628 - val_acc: 0.8034 - val_loss: 0.5262 Epoch 73/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7863 - loss: 0.5624 - val_acc: 0.8034 - val_loss: 0.5260 Epoch 74/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7868 - loss: 0.5620 - val_acc: 0.8034 - val_loss: 0.5259 Epoch 75/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7868 - loss: 0.5616 - val_acc: 0.8034 - val_loss: 0.5257 Epoch 76/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7885 - loss: 0.5612 - val_acc: 0.8034 - val_loss: 0.5255 Epoch 77/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.7885 - loss: 0.5608 - val_acc: 0.8034 - val_loss: 0.5253 Epoch 78/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7885 - loss: 0.5603 - val_acc: 0.8034 - val_loss: 0.5251 Epoch 79/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7885 - loss: 0.5599 - val_acc: 0.8034 - val_loss: 0.5249 Epoch 80/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7885 - loss: 0.5595 - val_acc: 0.8034 - val_loss: 0.5247 Epoch 81/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7885 - loss: 0.5592 - val_acc: 0.8034 - val_loss: 0.5246 Epoch 82/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7885 - loss: 0.5588 - val_acc: 0.8034 - val_loss: 0.5244 Epoch 83/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7885 - loss: 0.5586 - val_acc: 0.8034 - val_loss: 0.5243 Epoch 84/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7885 - loss: 0.5584 - val_acc: 0.8034 - val_loss: 0.5241 Epoch 85/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.7885 - loss: 0.5582 - val_acc: 0.8034 - val_loss: 0.5240 Epoch 86/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7885 - loss: 0.5579 - val_acc: 0.8034 - val_loss: 0.5239 Epoch 87/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7885 - loss: 0.5577 - val_acc: 0.8034 - val_loss: 0.5238 Epoch 88/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7885 - loss: 0.5574 - val_acc: 0.8034 - val_loss: 0.5237 Epoch 89/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7885 - loss: 0.5573 - val_acc: 0.8034 - val_loss: 0.5237 Epoch 90/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7885 - loss: 0.5571 - val_acc: 0.8034 - val_loss: 0.5236 Epoch 91/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7885 - loss: 0.5569 - val_acc: 0.8034 - val_loss: 0.5235 Epoch 92/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7885 - loss: 0.5567 - val_acc: 0.8034 - val_loss: 0.5234 Epoch 93/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7885 - loss: 0.5565 - val_acc: 0.8034 - val_loss: 0.5233 Epoch 94/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7885 - loss: 0.5563 - val_acc: 0.8034 - val_loss: 0.5231 Epoch 95/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7885 - loss: 0.5561 - val_acc: 0.8034 - val_loss: 0.5231 Epoch 96/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7885 - loss: 0.5559 - val_acc: 0.8034 - val_loss: 0.5229 Epoch 97/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7885 - loss: 0.5557 - val_acc: 0.8034 - val_loss: 0.5229 Epoch 98/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7885 - loss: 0.5555 - val_acc: 0.8034 - val_loss: 0.5227 Epoch 99/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.7885 - loss: 0.5553 - val_acc: 0.8034 - val_loss: 0.5227 Epoch 100/100 [1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - acc: 0.7885 - loss: 0.5551 - val_acc: 0.8034 - val_loss: 0.5226
# 함수로 만들어서 사용합시다.
def dl_history_plot(history):
plt.figure(figsize=(16,4))
plt.subplot(1,2,1)
plt.plot(history['loss'], label='loss', marker = '.')
plt.plot(history['val_loss'], label='val_loss', marker = '.')
plt.ylabel('Loss')
plt.xlabel('Epochs')
plt.legend()
plt.grid()
plt.subplot(1,2,2)
plt.plot(history['acc'], label='acc', marker = '.')
plt.plot(history['val_acc'], label='val_acc', marker = '.')
plt.ylabel('ACC')
plt.xlabel('Epochs')
plt.legend()
plt.grid()
plt.show()
dl_history_plot(history)
import numpy as np
from sklearn.metrics import roc_auc_score, accuracy_score
pred = model.predict(X_valid)
pred = np.where(pred >= .5, 1, 0)
print(classification_report(y_valid, pred))
print('Acc Score :', accuracy_score(y_valid, pred))
print('AUC Score :', roc_auc_score(y_valid, pred))
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step precision recall f1-score support 0 0.80 0.91 0.85 109 1 0.81 0.64 0.72 69 accuracy 0.80 178 macro avg 0.81 0.77 0.78 178 weighted avg 0.80 0.80 0.80 178 Acc Score : 0.8033707865168539 AUC Score : 0.7729690200771173
# 모듈 불러오기
from sklearn.metrics import precision_score
# 성능 평가
print('정밀도(Precision):', precision_score(y_valid, y_pred))
print('정밀도(Precision):', precision_score(y_valid, y_pred, average='binary')) # 2진분류
print('정밀도(Precision):', precision_score(y_valid, y_pred, average=None))
print('정밀도(Precision):', precision_score(y_valid, y_pred, average='macro'))
print('정밀도(Precision):', precision_score(y_valid, y_pred, average='weighted'))
정밀도(Precision): 0.7428571428571429 정밀도(Precision): 0.7428571428571429 정밀도(Precision): [0.84259259 0.74285714] 정밀도(Precision): 0.7927248677248677 정밀도(Precision): 0.8039310980322216
# 모듈 불러오기
from sklearn.metrics import recall_score
# 성능 평가
print('각각 0과 1에 대한 재현율(Recall):', recall_score(y_valid, y_pred, average=None))
각각 0과 1에 대한 재현율(Recall): [0.83486239 0.75362319]
# 모듈 불러오기
from sklearn.metrics import f1_score
# 성능 평가
print('각각 0과 1에 대한 F1 score:', f1_score(y_valid, y_pred, average=None))
각각 0과 1에 대한 F1 score: [0.83870968 0.74820144]
'Machine Learning' 카테고리의 다른 글
2.이진분류문제_3_악성사이트 (0) | 2024.06.02 |
---|---|
2.이진분류문제_2_대학진학 (0) | 2024.06.02 |
1.회귀문제_4_네비게이션 (0) | 2024.06.02 |
1.회귀문제_3_따릉이 (0) | 2024.06.02 |
1.회귀문제_2_보스턴 (0) | 2024.06.02 |
소중한 공감 감사합니다