새소식

Machine Learning

2.이진분류문제_1_타이타닉

  • -

layout: single
title: "jupyter notebook 변환하기!"
categories: coding
tag: [python, blog, jekyll]
toc: true
author_profile: false


타이타닉 생존 이진분류 예측 문제


import warnings
warnings.filterwarnings(action='ignore')

1. scikit-learn 패키지는 머신러닝 교육을 위한 최고의 파이썬 패키지입니다.

scikit-learn를 별칭(alias) sk로 임포트하는 코드를 작성하고 실행하세요.

# 여기에 답안코드를 작성하세요.
import sklearn as sk 

2. Pandas는 데이터 분석을 위해 널리 사용되는 파이썬 라이브러리입니다.

Pandas를 사용할 수 있도록 별칭(alias)을 pd로 해서 불러오세요.

# 여기에 답안코드를 작성하세요.
import pandas as pd 

3. 모델링을 위해 분석 및 처리할 데이터 파일을 읽어오려고 합니다.

Pandas함수로 데이터 파일을 읽어 데이터프레임 변수명 df에 할당하는 코드를 작성하세요.

path = 'https://raw.githubusercontent.com/khw11044/csv_dataset/master/titanic.0.csv'
df = pd.read_csv(path)
df.head()

PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S

다음 문항을 풀기 전에 아래 코드를 실행하세요.

# 여기에 답안코드를 작성하세요.
import seaborn as sns 
import matplotlib.pyplot as plt
plt.rcParams['font.family'] = 'Malgun Gothic'

3. 각 컬럼별 분포를 알아보려고 합니다

다음 컬럼별 분포도를 그리세요

  • Seaborn을 활용하세요.

  • countplot 함수를 사용하세요

  • 첫번째는 Survived

  • 두번째는 Pclass

  • 세번째는 Sex

  • 네번째는 Embarked

plt.figure(figsize=(16,4))
plt.subplot(1,4,1)
sns.countplot(x='Survived', data=df)
plt.title('Survived')

plt.subplot(1,4,2)
sns.countplot(x='Pclass', data=df)
plt.title('Pclass')

plt.subplot(1,4,3)
sns.countplot(x='Sex', data=df)
plt.title('Sex')

plt.subplot(1,4,4)
sns.countplot(x='Embarked', data=df)
plt.title('Embarked')

plt.show()

4. 각 컬럼별 생존자의 수를 알아보려고 합니다

다음 컬럼별 생존자 유무에 대한 수를 바그래프로 그리세요

  • Seaborn을 활용하세요.

  • countplot 함수를 사용하세요

  • 첫번째는 Survived

  • 두번째는 Pclass

  • 세번째는 Sex

  • 네번째는 Embarked 에 대한 생존자 수를 그래프로 나타내세요

plt.figure(figsize=(16,8))
plt.subplot(2,4,1)
sns.countplot(x='Survived', data=df[df['Survived']==1])
plt.title('Survived')

plt.subplot(2,4,2)
sns.countplot(x='Pclass', data=df[df['Survived']==1])
plt.title('Pclass')

plt.subplot(2,4,3)
sns.countplot(x='Sex', data=df[df['Survived']==1])
plt.title('Sex')

plt.subplot(2,4,4)
sns.countplot(x='Embarked', data=df[df['Survived']==1])
plt.title('Embarked')

plt.subplot(2,4,5)
sns.countplot(x='Survived', data=df[df['Survived']==0])
plt.title('UNSurvived')

plt.subplot(2,4,6)
sns.countplot(x='Pclass', data=df[df['Survived']==0])
plt.title('Pclass')

plt.subplot(2,4,7)
sns.countplot(x='Sex', data=df[df['Survived']==0])
plt.title('Sex')

plt.subplot(2,4,8)
sns.countplot(x='Embarked', data=df[df['Survived']==0])
plt.title('Embarked')

plt.tight_layout()
plt.show()
# 3등급 승객들이 보통 더 많이 죽었다, 남성이 보통 더 많이 죽었다, S Embarked가 더 많이 죽었다.

5. 생존 유무 별 나이의 분포를 알고 싶습니다

생존에 대한 나이 분포도를 그리세요

  • Seaborn을 활용하세요.

  • histplot 함수를 사용하세요

  • 첫번째는 전체 Age

  • 두번째는 산 사람의 Age

  • 세번째는 죽은 사람의 Age

plt.figure(figsize=(16,4))
plt.subplot(1,3,1)
sns.histplot(x='Age', data = df, bins=30, kde = True)
plt.title('Age')

plt.subplot(1,3,2)
sns.histplot(x='Age', data = df[df['Survived']==1], bins=30, kde = True)
plt.title('Survived_Age')

plt.subplot(1,3,3)
sns.histplot(x='Age', data = df[df['Survived']==0], bins=30, kde = True)
plt.title('NonSurvived_Age')

plt.show()
# 어릴 수록 살아남았다

6. 각 컬럼의 상관관계를 알아 보려고 합니다. 상관관계를 통해 필요없는 컬럼을 제거할 수 있습니다.

상관관계 히트맵을 그리세요.

  • numeric_only를 통해 숫자형 컬럼들만 나타내세요

  • cmap은 Blues로 하세요

  • cbar는 보이지 않습니다

  • 소수점은 소수점 3번째 자리 까지 나타내세요

# 여기에 답안코드를 작성하세요.
selected = ['PassengerId', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked', 'Survived']
df_seleted = df[selected]
# 상관관계 시각화 
plt.figure(figsize=(6,6))
sns.heatmap(df_seleted.corr(numeric_only=True),
           annot=True,
           cmap='Blues',
           cbar=False, # 옆에 칼라 바 제거 
           square=True,
            fmt='.3f', # 소수점
            annot_kws={'size':9}
           )    
plt.show()

# PassengerId, Age, SibSp, Parch 컬럼은 제거한다.

7. 모델링 성능을 제대로 얻기 위해서 불필요한 변수는 삭제해야 합니다.

아래 가이드를 따라 불필요 데이터를 삭제 처리하세요.

  • 대상 데이터프레임: df

  • 'PassengerId', 'Age', 'SibSp' 3개 컬럼을 삭제하세요.

  • 전처리 반영된 결과를 새로운 데이터프레임 변수명 df_del에 저장하세요.

# 변수 제거
drop_cols = ['PassengerId', 'Age', 'SibSp']
df_del = df.drop(drop_cols, axis=1)

# 확인
df_del.head()

Survived Pclass Name Sex Parch Ticket Fare Cabin Embarked
0 0 3 Braund, Mr. Owen Harris male 0 A/5 21171 7.2500 NaN S
1 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 0 PC 17599 71.2833 C85 C
2 1 3 Heikkinen, Miss. Laina female 0 STON/O2. 3101282 7.9250 NaN S
3 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 0 113803 53.1000 C123 S
4 0 3 Allen, Mr. William Henry male 0 373450 8.0500 NaN S

8. 모델링 성능을 제대로 얻기 위해서 결측치 처리는 필수입니다.

아래 가이드를 따라 결측치 처리하세요.

  • 대상 데이터프레임: df_del

  • 결측치를 확인하는 코드를 작성하세요.

  • 결측치가 있는 행(raw)를 삭제 하세요.

  • 전처리 반영된 결과를 새로운 데이터프레임 변수명 df_na에 저장하세요.

# NaN 값 확인
df_del.isnull().sum()
Survived      0
Pclass        0
Name          0
Sex           0
Parch         0
Ticket        0
Fare          0
Cabin       687
Embarked      2
dtype: int64
# Embarked 결측치 행 제거 
df_na = df_del.dropna(subset=['Embarked'], axis=0)  # 행은 axis=0
# Cabin 결측치 열 통으로 제거 
df_na = df_na.drop('Cabin', axis=1)  # 행은 axis=0
df_na.isnull().sum()
Survived    0
Pclass      0
Name        0
Sex         0
Parch       0
Ticket      0
Fare        0
Embarked    0
dtype: int64
selected = ['Pclass', 'Sex', 'Fare', 'Embarked', 'Survived']
df_sel = df_na[selected]

df_sel.head()

Pclass Sex Fare Embarked Survived
0 3 male 7.2500 S 0
1 1 female 71.2833 C 1
2 3 female 7.9250 S 1
3 1 female 53.1000 S 1
4 3 male 8.0500 S 0

9. 원-핫 인코딩(One-hot encoding)은 범주형 변수를 1과 0의 이진형 벡터로 변환하기 위하여 사용하는 방법입니다.

원-핫 인코딩으로 아래 조건에 해당하는 컬럼 데이터를 변환하세요.

  • 대상 데이터프레임: df_sel

  • 원-핫 인코딩 대상: 'Pclass', 'Sex', 'Embarked'

  • 활용 함수: pandas의 get_dummies

  • 해당 전처리가 반영된 결과를 데이터프레임 변수 df_preset에 저장해 주세요.

# 가변수화 대상: Pclass, Sex, Embarked
dumm_cols = ['Pclass', 'Sex', 'Embarked']

# 가변수화
df_preset = pd.get_dummies(df_sel, columns=dumm_cols, drop_first=True, dtype=int)

# 확인
df_preset.head()

Fare Survived Pclass_2 Pclass_3 Sex_male Embarked_Q Embarked_S
0 7.2500 0 0 1 1 0 1
1 71.2833 1 0 0 0 0 0
2 7.9250 1 0 1 0 0 1
3 53.1000 1 0 0 0 0 1
4 8.0500 0 0 1 1 0 1

10. 훈련과 검증 각각에 사용할 데이터셋을 분리하려고 합니다.

Survived(생존) 컬럼을 label값 y로, 나머지 컬럼을 feature값 X로 할당한 후 훈련데이터셋과 검증데이터셋으로 분리하세요.

  • 대상 데이터프레임: df

  • 훈련과 검증 데이터셋 분리

    • 훈련 데이터셋 label: y_train, 훈련 데이터셋 Feature: X_train

    • 검증 데이터셋 label: y_valid, 검증 데이터셋 Feature: X_valid

    • 훈련 데이터셋과 검증데이터셋 비율은 80:20

    • random_state: 42

    • Scikit-learn의 train_test_split 함수를 활용하세요.

  • 스케일링 수행

    • sklearn.preprocessing의 MinMaxScaler 함수 사용

    • 훈련데이터셋의 Feature는 MinMaxScaler의 fit_transform 함수를 활용하여 X_train 변수로 할당

    • 검증데이터셋의 Feature는 MinMaxScaler의 transform 함수를 활용하여 X_valid 변수로 할당

# 여기에 답안코드를 작성하세요.
from sklearn.model_selection import train_test_split

target = 'Survived'

x = df_preset.drop(target, axis=1)
y = df_preset[target]

X_train, X_valid, y_train, y_valid = train_test_split(x,y, test_size=0.2, random_state=42)
print(X_train.shape, X_valid.shape, y_train.shape, y_valid.shape)
(711, 6) (178, 6) (711,) (178,)
from sklearn.preprocessing import MinMaxScaler, StandardScaler, RobustScaler

scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_valid = scaler.transform(X_valid)

11. Survived(생존)을 예측하는 머신러닝 모델을 만들려고 합니다.

아래 가이드에 따라 다음 모델을 만들고 학습을 진행하세요.

  • 가장 성능이 좋은 모델 이름을 답안11 변수에 저장하세요

    • 예. 답안11 = 'KNeighborsClassifier' 혹은 'DecisionTreeClassifier' 혹은 'LogisticRegression' 혹은 'RandomForestClassifier' 등등
# 1단계: 불러오기
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier

from sklearn.metrics import confusion_matrix, classification_report, roc_auc_score
models = {
    "KNeighborsClassifier": {"model":KNeighborsClassifier(n_neighbors=5)},
    "DecisionTreeClassifier": {"model":DecisionTreeClassifier()},
    "LogisticRegression": {"model":LogisticRegression()},
    "RandomForestClassifier": {"model":RandomForestClassifier()},
    "XGBClassifier": {"model":XGBClassifier()},
    "LGBMClassifier": {"model":LGBMClassifier(verbose=-1)}
}
from time import perf_counter

# Train모델 학습
for name, model in models.items():
    model = model['model']
    start = perf_counter()

    history = model.fit(X_train, y_train)

    # 학습시간과 val_accuracy 저장
    duration = perf_counter() - start
    duration = round(duration,2)
    models[name]['perf'] = duration

    y_train_pred = model.predict(X_train)
    y_val_pred = model.predict(X_valid)

    train_score = round(model.score(X_train, y_train),4)
    val_score = round(model.score(X_valid, y_valid),4)

    models[name]['train_score'] = train_score
    models[name]['val_score'] = val_score

    print(f"{name:20} trained in {duration} sec, train_score: {train_score}. val_score: {val_score}")

# Create a DataFrame with the results
models_result = []

for name, v in models.items():
    models_result.append([ name, models[name]['val_score'], 
                          models[name]['perf']])

df_results = pd.DataFrame(models_result, 
                          columns = ['model','val_score','Training time (sec)'])
df_results.sort_values(by='val_score', ascending=False, inplace=True)
df_results.reset_index(inplace=True,drop=True)
df_results
KNeighborsClassifier trained in 0.0 sec, train_score: 0.8467. val_score: 0.8034
DecisionTreeClassifier trained in 0.0 sec, train_score: 0.9156. val_score: 0.7921
LogisticRegression   trained in 0.0 sec, train_score: 0.7764. val_score: 0.7809
RandomForestClassifier trained in 0.17 sec, train_score: 0.9156. val_score: 0.8034
XGBClassifier        trained in 0.06 sec, train_score: 0.9086. val_score: 0.8034
LGBMClassifier       trained in 0.04 sec, train_score: 0.8833. val_score: 0.7584

model val_score Training time (sec)
0 KNeighborsClassifier 0.8034 0.00
1 RandomForestClassifier 0.8034 0.17
2 XGBClassifier 0.8034 0.06
3 DecisionTreeClassifier 0.7921 0.00
4 LogisticRegression 0.7809 0.00
5 LGBMClassifier 0.7584 0.04
def check_performance_for_model(df_results):
    plt.figure(figsize = (15,5))
    sns.barplot(x = 'model', y = 'val_score', data = df_results)
    plt.title('ACC (%) on the Test set', fontsize = 15)
    plt.ylim(0,1.2)
    plt.xticks(rotation=90)
    plt.show()

check_performance_for_model(df_results)
답안11='RandomForestClassifier'
from sklearn.metrics import roc_auc_score, accuracy_score
from sklearn.model_selection import cross_val_score

model = RandomForestClassifier()

# 3단계: 학습하기
model.fit(X_train, y_train)

# K-Fold CV로 성능을 검증
cv_score = cross_val_score(model, X_train, y_train, cv=10)
print('cv_score :', cv_score)
print('mean cv_score :', cv_score.mean())

# 4단계: 예측하기
y_pred = model.predict(X_valid)

# 5단계 평가하기
print(classification_report(y_valid, y_pred))
print('Acc Score :', accuracy_score(y_valid, y_pred))
print('AUC Score :', roc_auc_score(y_valid, y_pred))
cv_score : [0.76388889 0.76056338 0.78873239 0.87323944 0.78873239 0.83098592
 0.85915493 0.8028169  0.8028169  0.78873239]
mean cv_score : 0.8059663536776214
              precision    recall  f1-score   support

           0       0.84      0.84      0.84       109
           1       0.75      0.74      0.74        69

    accuracy                           0.80       178
   macro avg       0.79      0.79      0.79       178
weighted avg       0.80      0.80      0.80       178

Acc Score : 0.8033707865168539
AUC Score : 0.7915835660151576
# 데이터프레임 만들기
perf_dic = {'feature':list(x),
            'importance': model.feature_importances_}
df = pd.DataFrame(perf_dic)
df.sort_values(by='importance', ascending=True, inplace=True)

# 시각화
plt.figure(figsize=(5, 5))
plt.barh(df['feature'], df['importance'])
plt.show()

12. Survived(생존)을 예측하는 딥러닝 모델을 만들려고 합니다.

아래 가이드에 따라 모델링하고 학습을 진행하세요.

  • Tensoflow framework를 사용하여 딥러닝 모델을 만드세요.

  • 히든레이어(hidden layer) 2개이상으로 모델을 구성하세요.

  • 손실함수는 binary_crossentropy를 사용하세요.

  • 하이퍼파라미터 epochs: 100, batch_size: 16으로 설정해주세요.

  • 각 에포크마다 loss와 metrics 평가하기 위한 데이터로 X_valid, y_valid 사용하세요.

  • 학습정보는 history 변수에 저장해주세요

import tensorflow as tf
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.layers import Dense, Activation, Dropout, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.utils import to_categorical
# 규제를 위해 필요한 함수 불러오기
from tensorflow.keras.regularizers import l1, l2

tf.random.set_seed(1)
nfeatures = X_train.shape[1]
nfeatures

# Sequential 모델 만들기
model = Sequential()

model.add(Dense(32, activation='relu', input_shape=(nfeatures,), kernel_regularizer = l1(0.01)))
model.add(Dense(16, activation='relu', kernel_regularizer = l1(0.01)))
model.add(Dense(1, activation= 'sigmoid'))

model.compile(optimizer='adam', loss='binary_crossentropy',metrics=['acc'])
# es = EarlyStopping(monitor='val_loss', patience=4, mode='min', verbose=1)    # val_loss

history = model.fit(X_train, y_train, 
                    batch_size=16, 
                    epochs=100, 
                    # callbacks=[es],
                    validation_data=(X_valid, y_valid), 
                    verbose=1).history
Epoch 1/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - acc: 0.6283 - loss: 1.8807 - val_acc: 0.6798 - val_loss: 1.6821
Epoch 2/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.6472 - loss: 1.6094 - val_acc: 0.7472 - val_loss: 1.4374
Epoch 3/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.6921 - loss: 1.3801 - val_acc: 0.7978 - val_loss: 1.2279
Epoch 4/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7472 - loss: 1.1841 - val_acc: 0.8034 - val_loss: 1.0519
Epoch 5/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7472 - loss: 1.0212 - val_acc: 0.8034 - val_loss: 0.9069
Epoch 6/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7472 - loss: 0.8897 - val_acc: 0.8034 - val_loss: 0.7943
Epoch 7/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7472 - loss: 0.7905 - val_acc: 0.8034 - val_loss: 0.7116
Epoch 8/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7449 - loss: 0.7198 - val_acc: 0.8034 - val_loss: 0.6636
Epoch 9/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7609 - loss: 0.6821 - val_acc: 0.8034 - val_loss: 0.6387
Epoch 10/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7837 - loss: 0.6635 - val_acc: 0.8034 - val_loss: 0.6238
Epoch 11/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7601 - loss: 0.6525 - val_acc: 0.7978 - val_loss: 0.6126
Epoch 12/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.6446 - val_acc: 0.7978 - val_loss: 0.6045
Epoch 13/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.6387 - val_acc: 0.7978 - val_loss: 0.5975
Epoch 14/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.6334 - val_acc: 0.7978 - val_loss: 0.5908
Epoch 15/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.6281 - val_acc: 0.7978 - val_loss: 0.5850
Epoch 16/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.6235 - val_acc: 0.7978 - val_loss: 0.5797
Epoch 17/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.6195 - val_acc: 0.7978 - val_loss: 0.5756
Epoch 18/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.7650 - loss: 0.6163 - val_acc: 0.7978 - val_loss: 0.5719
Epoch 19/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.6133 - val_acc: 0.7978 - val_loss: 0.5689
Epoch 20/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.6108 - val_acc: 0.7978 - val_loss: 0.5661
Epoch 21/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.6086 - val_acc: 0.7978 - val_loss: 0.5637
Epoch 22/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.6066 - val_acc: 0.7978 - val_loss: 0.5616
Epoch 23/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.6051 - val_acc: 0.7978 - val_loss: 0.5600
Epoch 24/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.6036 - val_acc: 0.7978 - val_loss: 0.5584
Epoch 25/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.6023 - val_acc: 0.7978 - val_loss: 0.5572
Epoch 26/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.6010 - val_acc: 0.7978 - val_loss: 0.5558
Epoch 27/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5998 - val_acc: 0.7978 - val_loss: 0.5546
Epoch 28/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5986 - val_acc: 0.7978 - val_loss: 0.5534
Epoch 29/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5975 - val_acc: 0.7978 - val_loss: 0.5524
Epoch 30/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5963 - val_acc: 0.7978 - val_loss: 0.5513
Epoch 31/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5952 - val_acc: 0.7978 - val_loss: 0.5502
Epoch 32/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5940 - val_acc: 0.7978 - val_loss: 0.5491
Epoch 33/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5929 - val_acc: 0.7978 - val_loss: 0.5481
Epoch 34/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5917 - val_acc: 0.7978 - val_loss: 0.5470
Epoch 35/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5906 - val_acc: 0.7978 - val_loss: 0.5460
Epoch 36/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5896 - val_acc: 0.7978 - val_loss: 0.5448
Epoch 37/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5886 - val_acc: 0.7978 - val_loss: 0.5437
Epoch 38/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5874 - val_acc: 0.7978 - val_loss: 0.5426
Epoch 39/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5863 - val_acc: 0.7978 - val_loss: 0.5416
Epoch 40/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5852 - val_acc: 0.7978 - val_loss: 0.5404
Epoch 41/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5841 - val_acc: 0.7978 - val_loss: 0.5393
Epoch 42/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5829 - val_acc: 0.7978 - val_loss: 0.5383
Epoch 43/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5818 - val_acc: 0.7978 - val_loss: 0.5376
Epoch 44/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5809 - val_acc: 0.7978 - val_loss: 0.5370
Epoch 45/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5801 - val_acc: 0.7978 - val_loss: 0.5363
Epoch 46/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5792 - val_acc: 0.7978 - val_loss: 0.5357
Epoch 47/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5784 - val_acc: 0.7978 - val_loss: 0.5352
Epoch 48/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5776 - val_acc: 0.7978 - val_loss: 0.5346
Epoch 49/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5768 - val_acc: 0.7978 - val_loss: 0.5341
Epoch 50/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5760 - val_acc: 0.7978 - val_loss: 0.5335
Epoch 51/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5752 - val_acc: 0.7978 - val_loss: 0.5330
Epoch 52/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5745 - val_acc: 0.7978 - val_loss: 0.5325
Epoch 53/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5737 - val_acc: 0.7978 - val_loss: 0.5320
Epoch 54/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5729 - val_acc: 0.7978 - val_loss: 0.5315
Epoch 55/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5721 - val_acc: 0.7978 - val_loss: 0.5310
Epoch 56/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5713 - val_acc: 0.7978 - val_loss: 0.5304
Epoch 57/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5706 - val_acc: 0.7978 - val_loss: 0.5300
Epoch 58/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5698 - val_acc: 0.7978 - val_loss: 0.5294
Epoch 59/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5690 - val_acc: 0.7978 - val_loss: 0.5289
Epoch 60/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5684 - val_acc: 0.7978 - val_loss: 0.5284
Epoch 61/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5678 - val_acc: 0.7978 - val_loss: 0.5282
Epoch 62/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7650 - loss: 0.5673 - val_acc: 0.7978 - val_loss: 0.5280
Epoch 63/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7640 - loss: 0.5668 - val_acc: 0.7978 - val_loss: 0.5278
Epoch 64/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7640 - loss: 0.5663 - val_acc: 0.7978 - val_loss: 0.5275
Epoch 65/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7623 - loss: 0.5657 - val_acc: 0.8034 - val_loss: 0.5273
Epoch 66/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7607 - loss: 0.5652 - val_acc: 0.8034 - val_loss: 0.5271
Epoch 67/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7766 - loss: 0.5648 - val_acc: 0.8034 - val_loss: 0.5269
Epoch 68/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7783 - loss: 0.5644 - val_acc: 0.8034 - val_loss: 0.5268
Epoch 69/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7791 - loss: 0.5640 - val_acc: 0.8034 - val_loss: 0.5267
Epoch 70/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.7837 - loss: 0.5636 - val_acc: 0.8034 - val_loss: 0.5265
Epoch 71/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7852 - loss: 0.5632 - val_acc: 0.8034 - val_loss: 0.5264
Epoch 72/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7856 - loss: 0.5628 - val_acc: 0.8034 - val_loss: 0.5262
Epoch 73/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7863 - loss: 0.5624 - val_acc: 0.8034 - val_loss: 0.5260
Epoch 74/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7868 - loss: 0.5620 - val_acc: 0.8034 - val_loss: 0.5259
Epoch 75/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7868 - loss: 0.5616 - val_acc: 0.8034 - val_loss: 0.5257
Epoch 76/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7885 - loss: 0.5612 - val_acc: 0.8034 - val_loss: 0.5255
Epoch 77/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.7885 - loss: 0.5608 - val_acc: 0.8034 - val_loss: 0.5253
Epoch 78/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7885 - loss: 0.5603 - val_acc: 0.8034 - val_loss: 0.5251
Epoch 79/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7885 - loss: 0.5599 - val_acc: 0.8034 - val_loss: 0.5249
Epoch 80/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7885 - loss: 0.5595 - val_acc: 0.8034 - val_loss: 0.5247
Epoch 81/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7885 - loss: 0.5592 - val_acc: 0.8034 - val_loss: 0.5246
Epoch 82/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7885 - loss: 0.5588 - val_acc: 0.8034 - val_loss: 0.5244
Epoch 83/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7885 - loss: 0.5586 - val_acc: 0.8034 - val_loss: 0.5243
Epoch 84/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7885 - loss: 0.5584 - val_acc: 0.8034 - val_loss: 0.5241
Epoch 85/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.7885 - loss: 0.5582 - val_acc: 0.8034 - val_loss: 0.5240
Epoch 86/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7885 - loss: 0.5579 - val_acc: 0.8034 - val_loss: 0.5239
Epoch 87/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7885 - loss: 0.5577 - val_acc: 0.8034 - val_loss: 0.5238
Epoch 88/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7885 - loss: 0.5574 - val_acc: 0.8034 - val_loss: 0.5237
Epoch 89/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7885 - loss: 0.5573 - val_acc: 0.8034 - val_loss: 0.5237
Epoch 90/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7885 - loss: 0.5571 - val_acc: 0.8034 - val_loss: 0.5236
Epoch 91/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7885 - loss: 0.5569 - val_acc: 0.8034 - val_loss: 0.5235
Epoch 92/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7885 - loss: 0.5567 - val_acc: 0.8034 - val_loss: 0.5234
Epoch 93/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7885 - loss: 0.5565 - val_acc: 0.8034 - val_loss: 0.5233
Epoch 94/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7885 - loss: 0.5563 - val_acc: 0.8034 - val_loss: 0.5231
Epoch 95/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7885 - loss: 0.5561 - val_acc: 0.8034 - val_loss: 0.5231
Epoch 96/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7885 - loss: 0.5559 - val_acc: 0.8034 - val_loss: 0.5229
Epoch 97/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7885 - loss: 0.5557 - val_acc: 0.8034 - val_loss: 0.5229
Epoch 98/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7885 - loss: 0.5555 - val_acc: 0.8034 - val_loss: 0.5227
Epoch 99/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.7885 - loss: 0.5553 - val_acc: 0.8034 - val_loss: 0.5227
Epoch 100/100
45/45 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.7885 - loss: 0.5551 - val_acc: 0.8034 - val_loss: 0.5226
# 함수로 만들어서 사용합시다.
def dl_history_plot(history):
    plt.figure(figsize=(16,4))
    plt.subplot(1,2,1)
    plt.plot(history['loss'], label='loss', marker = '.')
    plt.plot(history['val_loss'], label='val_loss', marker = '.')
    plt.ylabel('Loss')
    plt.xlabel('Epochs')
    plt.legend()
    plt.grid()

    plt.subplot(1,2,2)
    plt.plot(history['acc'], label='acc', marker = '.')
    plt.plot(history['val_acc'], label='val_acc', marker = '.')
    plt.ylabel('ACC')
    plt.xlabel('Epochs')
    plt.legend()
    plt.grid()


    plt.show()

dl_history_plot(history)
import numpy as np 
from sklearn.metrics import roc_auc_score, accuracy_score

pred = model.predict(X_valid)


pred = np.where(pred >= .5, 1, 0)
print(classification_report(y_valid, pred))
print('Acc Score :', accuracy_score(y_valid, pred))
print('AUC Score :', roc_auc_score(y_valid, pred))
6/6 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step 
              precision    recall  f1-score   support

           0       0.80      0.91      0.85       109
           1       0.81      0.64      0.72        69

    accuracy                           0.80       178
   macro avg       0.81      0.77      0.78       178
weighted avg       0.80      0.80      0.80       178

Acc Score : 0.8033707865168539
AUC Score : 0.7729690200771173
# 모듈 불러오기
from sklearn.metrics import precision_score

# 성능 평가
print('정밀도(Precision):', precision_score(y_valid, y_pred))
print('정밀도(Precision):', precision_score(y_valid, y_pred, average='binary')) # 2진분류
print('정밀도(Precision):', precision_score(y_valid, y_pred, average=None))
print('정밀도(Precision):', precision_score(y_valid, y_pred, average='macro'))
print('정밀도(Precision):', precision_score(y_valid, y_pred, average='weighted'))
정밀도(Precision): 0.7428571428571429
정밀도(Precision): 0.7428571428571429
정밀도(Precision): [0.84259259 0.74285714]
정밀도(Precision): 0.7927248677248677
정밀도(Precision): 0.8039310980322216
# 모듈 불러오기
from sklearn.metrics import recall_score

# 성능 평가
print('각각 0과 1에 대한 재현율(Recall):', recall_score(y_valid, y_pred, average=None))
각각 0과 1에 대한 재현율(Recall): [0.83486239 0.75362319]
# 모듈 불러오기
from sklearn.metrics import f1_score

# 성능 평가
print('각각 0과 1에 대한 F1 score:', f1_score(y_valid, y_pred, average=None))
각각 0과 1에 대한 F1 score: [0.83870968 0.74820144]

'Machine Learning' 카테고리의 다른 글

2.이진분류문제_3_악성사이트  (0) 2024.06.02
2.이진분류문제_2_대학진학  (0) 2024.06.02
1.회귀문제_4_네비게이션  (0) 2024.06.02
1.회귀문제_3_따릉이  (0) 2024.06.02
1.회귀문제_2_보스턴  (0) 2024.06.02
Contents

포스팅 주소를 복사했습니다

이 글이 도움이 되었다면 공감 부탁드립니다.