새소식

Machine Learning

2.이진분류문제_2_대학진학

  • -

layout: single
title: "jupyter notebook 변환하기!"
categories: coding
tag: [python, blog, jekyll]
toc: true
author_profile: false


대학원진학 이진분류 예측 문제


import warnings
warnings.filterwarnings(action='ignore')

1. scikit-learn 패키지는 머신러닝 교육을 위한 최고의 파이썬 패키지입니다.

scikit-learn를 별칭(alias) sk로 임포트하는 코드를 작성하고 실행하세요.

# 여기에 답안코드를 작성하세요.
import sklearn as sk 

2. Pandas는 데이터 분석을 위해 널리 사용되는 파이썬 라이브러리입니다.

Pandas를 사용할 수 있도록 별칭(alias)을 pd로 해서 불러오세요.

# 여기에 답안코드를 작성하세요.
import pandas as pd 

3. 모델링을 위해 분석 및 처리할 데이터 파일을 읽어오려고 합니다.

Pandas함수로 데이터 파일을 읽어 데이터프레임 변수명 df에 할당하는 코드를 작성하세요.

path = 'https://raw.githubusercontent.com/khw11044/csv_dataset/master/admission_simple.csv'
df = pd.read_csv(path)
df.head()

GRE TOEFL RANK SOP LOR GPA RESEARCH ADMIT
0 337 118 4 4.5 4.5 9.65 1 1
1 324 107 4 4.0 4.5 8.87 1 1
2 316 104 3 3.0 3.5 8.00 1 0
3 322 110 3 3.5 2.5 8.67 1 1
4 314 103 2 2.0 3.0 8.21 0 0

데이터 설명

  • GRE: GRE Scores (out of 340)

  • TOEFL: TOEFL Scores (out of 120)

  • RANK: University Rating (out of 5)

  • SOP: Statement of Purpose Strength (out of 5)

  • LOR: Letter of Recommendation Strength (out of 5)

  • GPA: Undergraduate GPA (out of 10)

  • RESEARCH: Research Experience (either 0 or 1)

  • ADMIT: Chance of Admit (either 0 or 1)

다음 문항을 풀기 전에 아래 코드를 실행하세요.

# 여기에 답안코드를 작성하세요.
import seaborn as sns 
import matplotlib.pyplot as plt
plt.rcParams['font.family'] = 'Malgun Gothic'

3. 각 컬럼별 분포를 알아보려고 합니다

다음 컬럼별 분포도를 그리세요

  • Seaborn을 활용하세요.

  • countplot 함수를 사용하세요

  • 첫번째는 admit

  • 두번째는 rank

plt.figure(figsize=(16,4))
plt.subplot(1,5,1)
sns.countplot(x='ADMIT', data=df)
plt.title('ADMIT')

plt.subplot(1,5,2)
sns.countplot(x='RESEARCH', data=df)
plt.title('RESEARCH')

plt.subplot(1,5,3)
sns.countplot(x='LOR', data=df)
plt.title('LOR')

plt.subplot(1,5,4)
sns.countplot(x='SOP', data=df)
plt.title('SOP')

plt.subplot(1,5,5)
sns.countplot(x='RANK', data=df)
plt.title('RANK')

plt.show()

4. rank별 ADMIT의 수를 알아보려고 합니다

rank별 ADMIT 유무에 대한 수를 바그래프로 그리세요

  • Seaborn을 활용하세요.

  • countplot 함수를 사용하세요

  • 첫번째는 admit

plt.figure(figsize=(16,8))
plt.subplot(2,2,1)
sns.countplot(x='ADMIT', data=df[df['ADMIT']==1])
plt.title('ADMIT')

plt.subplot(2,2,2)
sns.countplot(x='RANK', data=df[df['ADMIT']==1])
plt.title('rank')

plt.subplot(2,2,3)
sns.countplot(x='ADMIT', data=df[df['ADMIT']==0])
plt.title('UNADMIT')

plt.subplot(2,2,4)
sns.countplot(x='RANK', data=df[df['ADMIT']==0])
plt.title('rank')

plt.tight_layout()
plt.show()
# 3등급 승객들이 보통 더 많이 죽었다, 남성이 보통 더 많이 죽었다, S Embarked가 더 많이 죽었다.
plt.figure(figsize=(16,8))
plt.subplot(2,2,1)
sns.countplot(x='ADMIT', data=df[df['ADMIT']==1])
plt.title('ADMIT')

plt.subplot(2,2,2)
sns.countplot(x='SOP', data=df[df['ADMIT']==1])
plt.title('SOP')

plt.subplot(2,2,3)
sns.countplot(x='ADMIT', data=df[df['ADMIT']==0])
plt.title('UNADMIT')

plt.subplot(2,2,4)
sns.countplot(x='SOP', data=df[df['ADMIT']==0])
plt.title('SOP')

plt.tight_layout()
plt.show()
# 3등급 승객들이 보통 더 많이 죽었다, 남성이 보통 더 많이 죽었다, S Embarked가 더 많이 죽었다.
plt.figure(figsize=(16,8))
plt.subplot(2,2,1)
sns.countplot(x='ADMIT', data=df[df['ADMIT']==1])
plt.title('ADMIT')

plt.subplot(2,2,2)
sns.countplot(x='LOR', data=df[df['ADMIT']==1])
plt.title('LOR')

plt.subplot(2,2,3)
sns.countplot(x='ADMIT', data=df[df['ADMIT']==0])
plt.title('UNADMIT')

plt.subplot(2,2,4)
sns.countplot(x='LOR', data=df[df['ADMIT']==0])
plt.title('LOR')

plt.tight_layout()
plt.show()
# 3등급 승객들이 보통 더 많이 죽었다, 남성이 보통 더 많이 죽었다, S Embarked가 더 많이 죽었다.
plt.figure(figsize=(16,8))
plt.subplot(2,2,1)
sns.countplot(x='ADMIT', data=df[df['ADMIT']==1])
plt.title('ADMIT')

plt.subplot(2,2,2)
sns.countplot(x='RESEARCH', data=df[df['ADMIT']==1])
plt.title('RESEARCH')

plt.subplot(2,2,3)
sns.countplot(x='ADMIT', data=df[df['ADMIT']==0])
plt.title('UNADMIT')

plt.subplot(2,2,4)
sns.countplot(x='RESEARCH', data=df[df['ADMIT']==0])
plt.title('RESEARCH')

plt.tight_layout()
plt.show()
# 3등급 승객들이 보통 더 많이 죽었다, 남성이 보통 더 많이 죽었다, S Embarked가 더 많이 죽었다.

5. admit 유무 별 GRE와 GPA의 분포를 알고 싶습니다

admit에 대한 GRE와 GPA의 분포도를 그리세요

  • Seaborn을 활용하세요.

  • histplot 함수를 사용하세요

  • 첫번째는 전체 GRE

  • 두번째는 admit의 GRE

  • 세번째는 Unadmit의 GRE

GPA도 똑같이

plt.figure(figsize=(16,4))
plt.subplot(1,3,1)
sns.histplot(x='GRE', data = df, bins=30, kde = True)
plt.title('GRE')

plt.subplot(1,3,2)
sns.histplot(x='GRE', data = df[df['ADMIT']==1], bins=30, kde = True)
plt.title('admit_GRE')

plt.subplot(1,3,3)
sns.histplot(x='GRE', data = df[df['ADMIT']==0], bins=30, kde = True)
plt.title('Unadmit_GRE')

plt.show()
# 어릴 수록 살아남았다
plt.figure(figsize=(16,4))
plt.subplot(1,3,1)
sns.histplot(x='GPA', data = df, bins=30, kde = True)
plt.title('GPA')

plt.subplot(1,3,2)
sns.histplot(x='GPA', data = df[df['ADMIT']==1], bins=30, kde = True)
plt.title('admit_GPA')

plt.subplot(1,3,3)
sns.histplot(x='GPA', data = df[df['ADMIT']==0], bins=30, kde = True)
plt.title('Unadmit_GPA')

plt.show()
# 어릴 수록 살아남았다

6. 각 컬럼의 상관관계를 알아 보려고 합니다. 상관관계를 통해 필요없는 컬럼을 제거할 수 있습니다.

상관관계 히트맵을 그리세요.

  • numeric_only를 통해 숫자형 컬럼들만 나타내세요

  • cmap은 Blues로 하세요

  • cbar는 보이지 않습니다

  • 소수점은 소수점 3번째 자리 까지 나타내세요

# 여기에 답안코드를 작성하세요.

# 상관관계 시각화 
plt.figure(figsize=(6,6))
sns.heatmap(df.corr(numeric_only=True),
           annot=True,
           cmap='Blues',
           cbar=False, # 옆에 칼라 바 제거 
           square=True,
            fmt='.3f', # 소수점
            annot_kws={'size':9}
           )    
plt.show()

# PassengerId, Age, SibSp, Parch 컬럼은 제거한다.

8. 모델링 성능을 제대로 얻기 위해서 결측치 처리는 필수입니다.

아래 가이드를 따라 결측치 처리하세요.

  • 대상 데이터프레임: df

  • 결측치를 확인하는 코드를 작성하세요.

  • 결측치가 있는 행(raw)를 삭제 하세요.

  • 전처리 반영된 결과를 새로운 데이터프레임 변수명 df_na에 저장하세요.

# NaN 값 확인
df.isnull().sum()
GRE         0
TOEFL       0
RANK        0
SOP         0
LOR         0
GPA         0
RESEARCH    0
ADMIT       0
dtype: int64

9. 원-핫 인코딩(One-hot encoding)은 범주형 변수를 1과 0의 이진형 벡터로 변환하기 위하여 사용하는 방법입니다.

원-핫 인코딩으로 아래 조건에 해당하는 컬럼 데이터를 변환하세요.

  • 대상 데이터프레임: df_sel

  • 원-핫 인코딩 대상: 'Pclass', 'Sex', 'Embarked'

  • 활용 함수: pandas의 get_dummies

  • 해당 전처리가 반영된 결과를 데이터프레임 변수 df_preset에 저장해 주세요.

df['RANK'].value_counts()
RANK
3    162
2    126
4    105
5     73
1     34
Name: count, dtype: int64
# 가변수화 대상: RANK
dumm_cols = ['RANK']

# 가변수화
df_preset = pd.get_dummies(df, columns=dumm_cols, drop_first=True, dtype=int)

# 확인
df_preset.head()

GRE TOEFL SOP LOR GPA RESEARCH ADMIT RANK_2 RANK_3 RANK_4 RANK_5
0 337 118 4.5 4.5 9.65 1 1 0 0 1 0
1 324 107 4.0 4.5 8.87 1 1 0 0 1 0
2 316 104 3.0 3.5 8.00 1 0 0 1 0 0
3 322 110 3.5 2.5 8.67 1 1 0 1 0 0
4 314 103 2.0 3.0 8.21 0 0 1 0 0 0

10. 훈련과 검증 각각에 사용할 데이터셋을 분리하려고 합니다.

ADMIT(대학원진학) 컬럼을 label값 y로, 나머지 컬럼을 feature값 X로 할당한 후 훈련데이터셋과 검증데이터셋으로 분리하세요.

  • 대상 데이터프레임: df_preset

  • 훈련과 검증 데이터셋 분리

    • 훈련 데이터셋 label: y_train, 훈련 데이터셋 Feature: X_train

    • 검증 데이터셋 label: y_valid, 검증 데이터셋 Feature: X_valid

    • 훈련 데이터셋과 검증데이터셋 비율은 80:20

    • random_state: 42

    • Scikit-learn의 train_test_split 함수를 활용하세요.

  • 스케일링 수행

    • sklearn.preprocessing의 MinMaxScaler 함수 사용

    • 훈련데이터셋의 Feature는 MinMaxScaler의 fit_transform 함수를 활용하여 X_train 변수로 할당

    • 검증데이터셋의 Feature는 MinMaxScaler의 transform 함수를 활용하여 X_valid 변수로 할당

# 여기에 답안코드를 작성하세요.
from sklearn.model_selection import train_test_split

target = 'ADMIT'

x = df_preset.drop(target, axis=1)
y = df_preset[target]

X_train, X_valid, y_train, y_valid = train_test_split(x,y, test_size=0.2, random_state=42)
print(X_train.shape, X_valid.shape, y_train.shape, y_valid.shape)
(400, 10) (100, 10) (400,) (100,)
from sklearn.preprocessing import MinMaxScaler, StandardScaler, RobustScaler

scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_valid = scaler.transform(X_valid)

11. ADMIT(대학원진학)을 예측하는 머신러닝 모델을 만들려고 합니다.

아래 가이드에 따라 다음 모델을 만들고 학습을 진행하세요.

  • 가장 성능이 좋은 모델 이름을 답안11 변수에 저장하세요

    • 예. 답안11 = 'KNeighborsClassifier' 혹은 'DecisionTreeClassifier' 혹은 'LogisticRegression' 혹은 'RandomForestClassifier' 등등
# 1단계: 불러오기
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier

from sklearn.metrics import confusion_matrix, classification_report, roc_auc_score
models = {
    "KNeighborsClassifier": {"model":KNeighborsClassifier(n_neighbors=5)},
    "DecisionTreeClassifier": {"model":DecisionTreeClassifier()},
    "LogisticRegression": {"model":LogisticRegression()},
    "RandomForestClassifier": {"model":RandomForestClassifier()},
    "XGBClassifier": {"model":XGBClassifier()},
    "LGBMClassifier": {"model":LGBMClassifier(verbose=-1)}
}
from time import perf_counter

# Train모델 학습
for name, model in models.items():
    model = model['model']
    start = perf_counter()

    history = model.fit(X_train, y_train)

    # 학습시간과 val_accuracy 저장
    duration = perf_counter() - start
    duration = round(duration,2)
    models[name]['perf'] = duration

    y_train_pred = model.predict(X_train)
    y_val_pred = model.predict(X_valid)

    train_score = round(model.score(X_train, y_train),4)
    val_score = round(model.score(X_valid, y_valid),4)

    models[name]['train_score'] = train_score
    models[name]['val_score'] = val_score

    print(f"{name:20} trained in {duration} sec, train_score: {train_score}. val_score: {val_score}")

# Create a DataFrame with the results
models_result = []

for name, v in models.items():
    models_result.append([ name, models[name]['val_score'], 
                          models[name]['perf']])

df_results = pd.DataFrame(models_result, 
                          columns = ['model','val_score','Training time (sec)'])
df_results.sort_values(by='val_score', ascending=False, inplace=True)
df_results.reset_index(inplace=True,drop=True)
df_results
KNeighborsClassifier trained in 0.0 sec, train_score: 0.9075. val_score: 0.89
DecisionTreeClassifier trained in 0.0 sec, train_score: 1.0. val_score: 0.81
LogisticRegression   trained in 0.0 sec, train_score: 0.8825. val_score: 0.84
RandomForestClassifier trained in 0.11 sec, train_score: 1.0. val_score: 0.9
XGBClassifier        trained in 0.04 sec, train_score: 1.0. val_score: 0.85
LGBMClassifier       trained in 0.02 sec, train_score: 1.0. val_score: 0.88

model val_score Training time (sec)
0 RandomForestClassifier 0.90 0.11
1 KNeighborsClassifier 0.89 0.00
2 LGBMClassifier 0.88 0.02
3 XGBClassifier 0.85 0.04
4 LogisticRegression 0.84 0.00
5 DecisionTreeClassifier 0.81 0.00
def check_performance_for_model(df_results):
    plt.figure(figsize = (15,5))
    sns.barplot(x = 'model', y = 'val_score', data = df_results)
    plt.title('ACC (%) on the Test set', fontsize = 15)
    plt.ylim(0,1.2)
    plt.xticks(rotation=90)
    plt.show()

check_performance_for_model(df_results)
답안11='KNeighborsClassifier'
from sklearn.metrics import roc_auc_score, accuracy_score
from sklearn.model_selection import cross_val_score

model = KNeighborsClassifier()

# 3단계: 학습하기
model.fit(X_train, y_train)

# K-Fold CV로 성능을 검증
cv_score = cross_val_score(model, X_train, y_train, cv=10)
print('cv_score :', cv_score)
print('mean cv_score :', cv_score.mean())

# 4단계: 예측하기
y_pred = model.predict(X_valid)

# 5단계 평가하기
print(classification_report(y_valid, y_pred))
print('Acc Score :', accuracy_score(y_valid, y_pred))
print('AUC Score :', roc_auc_score(y_valid, y_pred))
              precision    recall  f1-score   support

           0       0.95      0.88      0.91        64
           1       0.80      0.92      0.86        36

    accuracy                           0.89       100
   macro avg       0.88      0.90      0.88       100
weighted avg       0.90      0.89      0.89       100

Acc Score : 0.89
AUC Score : 0.8958333333333333

12. ADMIT(대학원진학)을 예측하는 딥러닝 모델을 만들려고 합니다.

아래 가이드에 따라 모델링하고 학습을 진행하세요.

  • Tensoflow framework를 사용하여 딥러닝 모델을 만드세요.

  • 히든레이어(hidden layer) 2개이상으로 모델을 구성하세요.

  • 손실함수는 binary_crossentropy를 사용하세요.

  • 하이퍼파라미터 epochs: 100, batch_size: 16으로 설정해주세요.

  • 각 에포크마다 loss와 metrics 평가하기 위한 데이터로 X_valid, y_valid 사용하세요.

  • 학습정보는 history 변수에 저장해주세요

import tensorflow as tf
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.layers import Dense, Activation, Dropout, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.utils import to_categorical
# 규제를 위해 필요한 함수 불러오기
from tensorflow.keras.regularizers import l1, l2

tf.random.set_seed(1)
nfeatures = X_train.shape[1]
nfeatures

# Sequential 모델 만들기
model = Sequential()

model.add(Dense(32, activation='relu', input_shape=(nfeatures,), kernel_regularizer = l1(0.01)))
model.add(Dense(16, activation='relu', kernel_regularizer = l1(0.01)))
model.add(Dense(1, activation= 'sigmoid'))

model.compile(optimizer='adam', loss='binary_crossentropy',metrics=['acc'])
# es = EarlyStopping(monitor='val_loss', patience=4, mode='min', verbose=1)    # val_loss

history = model.fit(X_train, y_train, 
                    batch_size=16, 
                    epochs=100, 
                    # callbacks=[es],
                    validation_data=(X_valid, y_valid), 
                    verbose=1).history
Epoch 1/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step - acc: 0.5903 - loss: 2.1358 - val_acc: 0.6300 - val_loss: 2.0082
Epoch 2/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.7896 - loss: 1.9416 - val_acc: 0.8000 - val_loss: 1.8115
Epoch 3/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - acc: 0.8187 - loss: 1.7621 - val_acc: 0.8300 - val_loss: 1.6269
Epoch 4/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8153 - loss: 1.5951 - val_acc: 0.8400 - val_loss: 1.4648
Epoch 5/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8099 - loss: 1.4456 - val_acc: 0.8600 - val_loss: 1.3216
Epoch 6/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.7974 - loss: 1.3130 - val_acc: 0.8600 - val_loss: 1.1950
Epoch 7/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.8025 - loss: 1.1953 - val_acc: 0.8800 - val_loss: 1.0821
Epoch 8/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8166 - loss: 1.0906 - val_acc: 0.8800 - val_loss: 0.9829
Epoch 9/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8153 - loss: 0.9996 - val_acc: 0.8900 - val_loss: 0.8990
Epoch 10/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8253 - loss: 0.9208 - val_acc: 0.8900 - val_loss: 0.8282
Epoch 11/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8260 - loss: 0.8526 - val_acc: 0.8700 - val_loss: 0.7696
Epoch 12/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8260 - loss: 0.7946 - val_acc: 0.8600 - val_loss: 0.7209
Epoch 13/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8354 - loss: 0.7486 - val_acc: 0.8600 - val_loss: 0.6841
Epoch 14/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8289 - loss: 0.7130 - val_acc: 0.8600 - val_loss: 0.6568
Epoch 15/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.8278 - loss: 0.6873 - val_acc: 0.8600 - val_loss: 0.6372
Epoch 16/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8283 - loss: 0.6679 - val_acc: 0.8600 - val_loss: 0.6213
Epoch 17/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8291 - loss: 0.6519 - val_acc: 0.8600 - val_loss: 0.6077
Epoch 18/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8294 - loss: 0.6379 - val_acc: 0.8600 - val_loss: 0.5964
Epoch 19/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8294 - loss: 0.6258 - val_acc: 0.8600 - val_loss: 0.5860
Epoch 20/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8294 - loss: 0.6148 - val_acc: 0.8600 - val_loss: 0.5766
Epoch 21/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8315 - loss: 0.6054 - val_acc: 0.8600 - val_loss: 0.5682
Epoch 22/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8363 - loss: 0.5960 - val_acc: 0.8600 - val_loss: 0.5603
Epoch 23/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8372 - loss: 0.5871 - val_acc: 0.8600 - val_loss: 0.5524
Epoch 24/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8372 - loss: 0.5792 - val_acc: 0.8600 - val_loss: 0.5455
Epoch 25/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8372 - loss: 0.5720 - val_acc: 0.8600 - val_loss: 0.5391
Epoch 26/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8372 - loss: 0.5657 - val_acc: 0.8600 - val_loss: 0.5335
Epoch 27/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8372 - loss: 0.5596 - val_acc: 0.8600 - val_loss: 0.5279
Epoch 28/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8415 - loss: 0.5540 - val_acc: 0.8600 - val_loss: 0.5228
Epoch 29/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8421 - loss: 0.5483 - val_acc: 0.8600 - val_loss: 0.5183
Epoch 30/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.8514 - loss: 0.5426 - val_acc: 0.8600 - val_loss: 0.5135
Epoch 31/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.8514 - loss: 0.5376 - val_acc: 0.8600 - val_loss: 0.5094
Epoch 32/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8562 - loss: 0.5330 - val_acc: 0.8600 - val_loss: 0.5058
Epoch 33/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8586 - loss: 0.5286 - val_acc: 0.8700 - val_loss: 0.5022
Epoch 34/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8586 - loss: 0.5244 - val_acc: 0.8700 - val_loss: 0.4989
Epoch 35/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8678 - loss: 0.5204 - val_acc: 0.8700 - val_loss: 0.4955
Epoch 36/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8678 - loss: 0.5165 - val_acc: 0.8700 - val_loss: 0.4926
Epoch 37/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8678 - loss: 0.5128 - val_acc: 0.8700 - val_loss: 0.4894
Epoch 38/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8678 - loss: 0.5094 - val_acc: 0.8700 - val_loss: 0.4866
Epoch 39/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8678 - loss: 0.5060 - val_acc: 0.8800 - val_loss: 0.4839
Epoch 40/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8678 - loss: 0.5025 - val_acc: 0.8900 - val_loss: 0.4809
Epoch 41/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8678 - loss: 0.4995 - val_acc: 0.8900 - val_loss: 0.4785
Epoch 42/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8678 - loss: 0.4963 - val_acc: 0.8900 - val_loss: 0.4757
Epoch 43/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8696 - loss: 0.4935 - val_acc: 0.8900 - val_loss: 0.4732
Epoch 44/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8696 - loss: 0.4907 - val_acc: 0.8900 - val_loss: 0.4709
Epoch 45/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8696 - loss: 0.4879 - val_acc: 0.8900 - val_loss: 0.4686
Epoch 46/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8717 - loss: 0.4852 - val_acc: 0.8900 - val_loss: 0.4662
Epoch 47/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8754 - loss: 0.4826 - val_acc: 0.8900 - val_loss: 0.4642
Epoch 48/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8774 - loss: 0.4799 - val_acc: 0.8900 - val_loss: 0.4620
Epoch 49/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.8774 - loss: 0.4773 - val_acc: 0.8900 - val_loss: 0.4600
Epoch 50/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8774 - loss: 0.4747 - val_acc: 0.8900 - val_loss: 0.4577
Epoch 51/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8774 - loss: 0.4724 - val_acc: 0.8900 - val_loss: 0.4555
Epoch 52/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8774 - loss: 0.4701 - val_acc: 0.8900 - val_loss: 0.4535
Epoch 53/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8808 - loss: 0.4678 - val_acc: 0.8900 - val_loss: 0.4518
Epoch 54/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8808 - loss: 0.4655 - val_acc: 0.8900 - val_loss: 0.4499
Epoch 55/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8808 - loss: 0.4633 - val_acc: 0.8900 - val_loss: 0.4481
Epoch 56/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8808 - loss: 0.4612 - val_acc: 0.8900 - val_loss: 0.4463
Epoch 57/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8808 - loss: 0.4590 - val_acc: 0.8900 - val_loss: 0.4448
Epoch 58/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8808 - loss: 0.4570 - val_acc: 0.8900 - val_loss: 0.4432
Epoch 59/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8808 - loss: 0.4552 - val_acc: 0.8900 - val_loss: 0.4418
Epoch 60/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8808 - loss: 0.4532 - val_acc: 0.8900 - val_loss: 0.4402
Epoch 61/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8808 - loss: 0.4513 - val_acc: 0.8900 - val_loss: 0.4389
Epoch 62/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8808 - loss: 0.4493 - val_acc: 0.8900 - val_loss: 0.4374
Epoch 63/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8808 - loss: 0.4473 - val_acc: 0.8900 - val_loss: 0.4360
Epoch 64/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8808 - loss: 0.4456 - val_acc: 0.8900 - val_loss: 0.4346
Epoch 65/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8808 - loss: 0.4438 - val_acc: 0.8900 - val_loss: 0.4332
Epoch 66/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8808 - loss: 0.4422 - val_acc: 0.8900 - val_loss: 0.4320
Epoch 67/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8808 - loss: 0.4404 - val_acc: 0.8900 - val_loss: 0.4306
Epoch 68/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8808 - loss: 0.4388 - val_acc: 0.8900 - val_loss: 0.4295
Epoch 69/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8808 - loss: 0.4371 - val_acc: 0.8900 - val_loss: 0.4281
Epoch 70/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8808 - loss: 0.4354 - val_acc: 0.8800 - val_loss: 0.4268
Epoch 71/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8821 - loss: 0.4338 - val_acc: 0.8800 - val_loss: 0.4256
Epoch 72/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8821 - loss: 0.4321 - val_acc: 0.8800 - val_loss: 0.4243
Epoch 73/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8821 - loss: 0.4304 - val_acc: 0.8800 - val_loss: 0.4230
Epoch 74/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8818 - loss: 0.4288 - val_acc: 0.8800 - val_loss: 0.4218
Epoch 75/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8818 - loss: 0.4272 - val_acc: 0.8800 - val_loss: 0.4206
Epoch 76/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8818 - loss: 0.4256 - val_acc: 0.8700 - val_loss: 0.4195
Epoch 77/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8818 - loss: 0.4241 - val_acc: 0.8700 - val_loss: 0.4184
Epoch 78/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8818 - loss: 0.4225 - val_acc: 0.8600 - val_loss: 0.4173
Epoch 79/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8818 - loss: 0.4209 - val_acc: 0.8600 - val_loss: 0.4162
Epoch 80/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8818 - loss: 0.4194 - val_acc: 0.8600 - val_loss: 0.4152
Epoch 81/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8818 - loss: 0.4178 - val_acc: 0.8700 - val_loss: 0.4140
Epoch 82/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8794 - loss: 0.4164 - val_acc: 0.8600 - val_loss: 0.4131
Epoch 83/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8794 - loss: 0.4147 - val_acc: 0.8700 - val_loss: 0.4118
Epoch 84/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8794 - loss: 0.4132 - val_acc: 0.8700 - val_loss: 0.4109
Epoch 85/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8836 - loss: 0.4117 - val_acc: 0.8700 - val_loss: 0.4098
Epoch 86/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8836 - loss: 0.4104 - val_acc: 0.8700 - val_loss: 0.4091
Epoch 87/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8836 - loss: 0.4091 - val_acc: 0.8700 - val_loss: 0.4081
Epoch 88/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8836 - loss: 0.4079 - val_acc: 0.8700 - val_loss: 0.4075
Epoch 89/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8836 - loss: 0.4065 - val_acc: 0.8700 - val_loss: 0.4063
Epoch 90/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8836 - loss: 0.4053 - val_acc: 0.8700 - val_loss: 0.4057
Epoch 91/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8836 - loss: 0.4039 - val_acc: 0.8700 - val_loss: 0.4048
Epoch 92/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8836 - loss: 0.4028 - val_acc: 0.8700 - val_loss: 0.4041
Epoch 93/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8836 - loss: 0.4015 - val_acc: 0.8700 - val_loss: 0.4031
Epoch 94/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8836 - loss: 0.4003 - val_acc: 0.8700 - val_loss: 0.4026
Epoch 95/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8832 - loss: 0.3991 - val_acc: 0.8700 - val_loss: 0.4017
Epoch 96/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8855 - loss: 0.3981 - val_acc: 0.8700 - val_loss: 0.4012
Epoch 97/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8855 - loss: 0.3969 - val_acc: 0.8700 - val_loss: 0.4003
Epoch 98/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8855 - loss: 0.3959 - val_acc: 0.8700 - val_loss: 0.3998
Epoch 99/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8855 - loss: 0.3947 - val_acc: 0.8700 - val_loss: 0.3991
Epoch 100/100
25/25 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - acc: 0.8855 - loss: 0.3936 - val_acc: 0.8700 - val_loss: 0.3985
# 함수로 만들어서 사용합시다.
def dl_history_plot(history):
    plt.figure(figsize=(16,4))
    plt.subplot(1,2,1)
    plt.plot(history['loss'], label='loss', marker = '.')
    plt.plot(history['val_loss'], label='val_loss', marker = '.')
    plt.ylabel('Loss')
    plt.xlabel('Epochs')
    plt.legend()
    plt.grid()

    plt.subplot(1,2,2)
    plt.plot(history['acc'], label='acc', marker = '.')
    plt.plot(history['val_acc'], label='val_acc', marker = '.')
    plt.ylabel('ACC')
    plt.xlabel('Epochs')
    plt.legend()
    plt.grid()


    plt.show()

dl_history_plot(history)
import numpy as np 
from sklearn.metrics import roc_auc_score, accuracy_score

pred = model.predict(X_valid)


pred = np.where(pred >= .5, 1, 0)
print(classification_report(y_valid, pred))
print('Acc Score :', accuracy_score(y_valid, pred))
print('AUC Score :', roc_auc_score(y_valid, pred))
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step 
              precision    recall  f1-score   support

           0       0.93      0.86      0.89        64
           1       0.78      0.89      0.83        36

    accuracy                           0.87       100
   macro avg       0.86      0.87      0.86       100
weighted avg       0.88      0.87      0.87       100

Acc Score : 0.87
AUC Score : 0.8741319444444444
# 모듈 불러오기
from sklearn.metrics import precision_score

# 성능 평가
print('정밀도(Precision):', precision_score(y_valid, y_pred))
print('정밀도(Precision):', precision_score(y_valid, y_pred, average='binary')) # 2진분류
print('정밀도(Precision):', precision_score(y_valid, y_pred, average=None))
print('정밀도(Precision):', precision_score(y_valid, y_pred, average='macro'))
print('정밀도(Precision):', precision_score(y_valid, y_pred, average='weighted'))
정밀도(Precision): 0.8048780487804879
정밀도(Precision): 0.8048780487804879
정밀도(Precision): [0.94915254 0.80487805]
정밀도(Precision): 0.8770152955766846
정밀도(Precision): 0.8972137246796197
# 모듈 불러오기
from sklearn.metrics import recall_score

# 성능 평가
print('각각 0과 1에 대한 재현율(Recall):', recall_score(y_valid, y_pred, average=None))
각각 0과 1에 대한 재현율(Recall): [0.875      0.91666667]
# 모듈 불러오기
from sklearn.metrics import f1_score

# 성능 평가
print('각각 0과 1에 대한 F1 score:', f1_score(y_valid, y_pred, average=None))
각각 0과 1에 대한 F1 score: [0.91056911 0.85714286]

'Machine Learning' 카테고리의 다른 글

3.다중분류문제_1_아이리스  (0) 2024.06.02
2.이진분류문제_3_악성사이트  (0) 2024.06.02
2.이진분류문제_1_타이타닉  (0) 2024.06.02
1.회귀문제_4_네비게이션  (0) 2024.06.02
1.회귀문제_3_따릉이  (0) 2024.06.02
Contents

포스팅 주소를 복사했습니다

이 글이 도움이 되었다면 공감 부탁드립니다.