새소식

Machine Learning

2.이진분류문제_3_악성사이트

  • -

layout: single
title: "jupyter notebook 변환하기!"
categories: coding
tag: [python, blog, jekyll]
toc: true
author_profile: false


악성사이트 탐지 이진 분류 예측 문제


import warnings
warnings.filterwarnings(action='ignore')

1. scikit-learn 패키지는 머신러닝 교육을 위한 최고의 파이썬 패키지입니다.

scikit-learn를 별칭(alias) sk로 임포트하는 코드를 작성하고 실행하세요.

# 여기에 답안코드를 작성하세요.
import sklearn as sk 

2. Pandas는 데이터 분석을 위해 널리 사용되는 파이썬 라이브러리입니다.

Pandas를 사용할 수 있도록 별칭(alias)을 pd로 해서 불러오세요.

# 여기에 답안코드를 작성하세요.
import pandas as pd 

3. 모델링을 위해 분석 및 처리할 데이터 파일을 읽어오려고 합니다.

Pandas함수로 데이터 파일을 읽어 데이터프레임 변수명 df에 할당하는 코드를 작성하세요.

path = 'https://raw.githubusercontent.com/khw11044/csv_dataset/master/Malicious_Site_Detection.csv'
df = pd.read_csv(path)
df.head()
  url_len url_num_hyphens_dom url_path_len url_domain_len url_hostname_len url_num_dots url_num_underscores url_query_len url_num_query_para url_ip_present ... html_num_tags('script') html_num_tags('embed') html_num_tags('object') html_num_tags('div') html_num_tags('head') html_num_tags('body') html_num_tags('form') html_num_tags('a') html_num_tags('applet') label
0 23.0 0.0 8.0 15.0 15.0 2.0 0.0 0.0 0.0 0.0 ... 7.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 malicious
1 75.0 0.0 58.0 17.0 17.0 6.0 0.0 0.0 0.0 0.0 ... 18.0 0.0 0.0 20.0 1.0 1.0 0.0 21.0 0.0 benign
2 20.0 0.0 4.0 16.0 16.0 2.0 0.0 0.0 0.0 0.0 ... 33.0 0.0 0.0 101.0 1.0 1.0 3.0 70.0 0.0 benign
3 27.0 0.0 13.0 14.0 14.0 3.0 0.0 0.0 0.0 0.0 ... 15.0 0.0 0.0 151.0 1.0 1.0 1.0 55.0 0.0 benign
4 39.0 2.0 12.0 27.0 27.0 2.0 0.0 0.0 0.0 0.0 ... 10.0 0.0 0.0 332.0 1.0 1.0 0.0 321.0 0.0 benign

5 rows × 24 columns

데이터 설명

  • url_len : URL 길이
  • url_num_hypens_dom : URL내 '-'(하이픈) 개수
  • url_path_len : URL의 경로 길이
  • url_domain_len : URL의 도메인 길이
  • url_host_name : URL의 hostname 길이
  • url_num_dots : URL내 '.'(닷) 개수
  • url_num_underscores : URL내 '_'(언더바) 개수
  • url_query_len : URL쿼리 길이
  • url_num_query_para : URL쿼리의 파라미터 개수
  • url_ip_present : URL내 IP표시 여부
  • url_entropy : URL 복잡도
  • url_chinese_present : URL내 중국어 표기 여부
  • url_port : URL내 포트 표기 여부
  • html_num_tags('iframe') : HTML내 'iframe' 태그 개수
  • html_num_tags('script') : HTML내 'script' 태그 개수
  • html_num_tags('embed') : HTML내 'embed' 태그 개수
  • html_num_tags('object') : HTML내 'object' 태그 개수
  • html_num_tags('div') : HTML내 'div' 태그 개수
  • html_num_tags('head') : HTML내 'head' 태그 개수
  • html_num_tags('body') : HTML내 'body' 태그 개수
  • html_num_tags('form') : HTML내 'form' 태그 개수
  • html_num_tags('a') : HTML내 'a' 태그 개수
  • html_num_tags('applet') : HTML내 'applet' 태그 개수
  • label : 악성사이트 여부 컬럼 ( 'malicious'는 악성사이트, 'benign'은 정상사이트 )

다음 문항을 풀기 전에 아래 코드를 실행하세요.

# 여기에 답안코드를 작성하세요.
import seaborn as sns 
import matplotlib.pyplot as plt
plt.rcParams['font.family'] = 'Malgun Gothic'

4. 각 컬럼들의 분포를 알고 싶습니다

각 컬럼들의 분포도를 그리세요

  • 판다스의 hist 함수를 사용하세요
df.hist(bins=10, grid=True, figsize=(20,20))
plt.show()

Q2. 중복 데이터 제거

우리가 접속하는 대부분의 웹사이트는 정상 사이트입니다.

또한, 특정 몇 개 사이트(ex. google, instagram, facebook 등)에 접속 빈도가 높습니다.

편중된 데이터는 모델 학습에 안 좋은 영향을 주기 때문에 중복 데이터 제거를 통해 해결합니다.

이 과정은 데이터 전처리 시 반드시 해야 하는 과정은 아니며, 프로젝트/데이터 성격에 맞게 결정하시면 됩니다.

[문제1] df info()를 통해 데이터를 확인하고 중복된 데이터는 삭제해주세요. 삭제 후 info()를 통해 이전 데이터와 비교해 보세요.

a = df.duplicated()

a.value_counts()
False    3233
True      431
Name: count, dtype: int64
df=df.drop_duplicates()
df.columns
Index(['url_len', 'url_num_hyphens_dom', 'url_path_len', 'url_domain_len',
       'url_hostname_len', 'url_num_dots', 'url_num_underscores',
       'url_query_len', 'url_num_query_para', 'url_ip_present', 'url_entropy',
       'url_chinese_present', 'url_port', 'html_num_tags('iframe')',
       'html_num_tags('script')', 'html_num_tags('embed')',
       'html_num_tags('object')', 'html_num_tags('div')',
       'html_num_tags('head')', 'html_num_tags('body')',
       'html_num_tags('form')', 'html_num_tags('a')',
       'html_num_tags('applet')', 'label'],
      dtype='object')
df['label'].value_counts()
label
benign       1618
malicious    1615
Name: count, dtype: int64
plt.figure(figsize=(6, 4))
df['label'].value_counts().plot(kind='bar')
plt.xlabel('Label')
plt.ylabel('Counts')
plt.show()

Q3. 텍스트와 범주형 특성 처리

기계가 데이터를 인식할 수 있도록 텍스트 데이터를 수치형 데이터로 변경합니다.

  • replace() 함수를 이용한 텍스트와 범주형 특성 처리
df['label'].unique()
array(['malicious', 'benign'], dtype=object)
df['label'] = df['label'].astype('category')
df['label'] = df['label'].cat.codes


# 또는 
# df.loc[:,['label_binary']] = df['label'].copy()
# df['label_binary'].replace({'benign':0,'malicious':1}, inplace=True)
# df.drop(['label'],axis=1,inplace=True)

df.head()
  url_len url_num_hyphens_dom url_path_len url_domain_len url_hostname_len url_num_dots url_num_underscores url_query_len url_num_query_para url_ip_present ... html_num_tags('script') html_num_tags('embed') html_num_tags('object') html_num_tags('div') html_num_tags('head') html_num_tags('body') html_num_tags('form') html_num_tags('a') html_num_tags('applet') label
0 23.0 0.0 8.0 15.0 15.0 2.0 0.0 0.0 0.0 0.0 ... 7.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 1
1 75.0 0.0 58.0 17.0 17.0 6.0 0.0 0.0 0.0 0.0 ... 18.0 0.0 0.0 20.0 1.0 1.0 0.0 21.0 0.0 0
2 20.0 0.0 4.0 16.0 16.0 2.0 0.0 0.0 0.0 0.0 ... 33.0 0.0 0.0 101.0 1.0 1.0 3.0 70.0 0.0 0
3 27.0 0.0 13.0 14.0 14.0 3.0 0.0 0.0 0.0 0.0 ... 15.0 0.0 0.0 151.0 1.0 1.0 1.0 55.0 0.0 0
4 39.0 2.0 12.0 27.0 27.0 2.0 0.0 0.0 0.0 0.0 ... 10.0 0.0 0.0 332.0 1.0 1.0 0.0 321.0 0.0 0

5 rows × 24 columns

df['label'].unique()
array([1, 0], dtype=int8)

Q4. 결측치 처리

  • 데이터 수집 과정에서 발생한 오류 등으로 인해 결측치가 포함된 경우가 많습니다.
  • 모델링 전에 결측치를 확인하고 이를 정제하는 과정은 필요합니다.
df.isnull().sum()
url_len                    0
url_num_hyphens_dom        0
url_path_len               1
url_domain_len             1
url_hostname_len           0
url_num_dots               0
url_num_underscores        0
url_query_len              0
url_num_query_para         0
url_ip_present             0
url_entropy                0
url_chinese_present        0
url_port                   0
html_num_tags('iframe')    0
html_num_tags('script')    0
html_num_tags('embed')     0
html_num_tags('object')    0
html_num_tags('div')       0
html_num_tags('head')      0
html_num_tags('body')      0
html_num_tags('form')      0
html_num_tags('a')         0
html_num_tags('applet')    0
label                      0
dtype: int64
df = df.dropna(axis=0)

5. 각 컬럼의 상관관계를 알아 보려고 합니다. 상관관계를 통해 필요없는 컬럼을 제거할 수 있습니다.

상관관계가 높은 순으로 정렬하여 나타내고 다음 컬럼을 제거합니다

불필요한 컬럼은 제거

"url_chinese_present","html_num_tags('applet')"

abs(df.corr()['label']).sort_values(ascending=False)
label                      1.000000
url_hostname_len           0.384489
url_domain_len             0.380448
url_num_hyphens_dom        0.355480
html_num_tags('script')    0.202309
url_query_len              0.189689
url_num_query_para         0.184497
url_entropy                0.162198
url_num_underscores        0.133808
html_num_tags('form')      0.116354
html_num_tags('a')         0.113966
url_path_len               0.113835
html_num_tags('embed')     0.111295
html_num_tags('body')      0.110581
html_num_tags('object')    0.105710
url_ip_present             0.076236
html_num_tags('div')       0.061183
url_num_dots               0.047256
html_num_tags('iframe')    0.033966
html_num_tags('head')      0.012990
url_port                   0.006642
url_len                    0.006429
url_chinese_present             NaN
html_num_tags('applet')         NaN
Name: label, dtype: float64
df.drop(columns=["url_chinese_present","html_num_tags('applet')"],inplace=True)
abs(df.corr()['label']).sort_values(ascending=False)
label                      1.000000
url_hostname_len           0.384489
url_domain_len             0.380448
url_num_hyphens_dom        0.355480
html_num_tags('script')    0.202309
url_query_len              0.189689
url_num_query_para         0.184497
url_entropy                0.162198
url_num_underscores        0.133808
html_num_tags('form')      0.116354
html_num_tags('a')         0.113966
url_path_len               0.113835
html_num_tags('embed')     0.111295
html_num_tags('body')      0.110581
html_num_tags('object')    0.105710
url_ip_present             0.076236
html_num_tags('div')       0.061183
url_num_dots               0.047256
html_num_tags('iframe')    0.033966
html_num_tags('head')      0.012990
url_port                   0.006642
url_len                    0.006429
Name: label, dtype: float64

7. 훈련과 검증 각각에 사용할 데이터셋을 분리하려고 합니다.

Species(품종) 컬럼을 label값 y로, 나머지 컬럼을 feature값 X로 할당한 후 훈련데이터셋과 검증데이터셋으로 분리하세요.

  • 대상 데이터프레임: df_preset
  • 훈련과 검증 데이터셋 분리
    • 훈련 데이터셋 label: y_train, 훈련 데이터셋 Feature: X_train
    • 검증 데이터셋 label: y_valid, 검증 데이터셋 Feature: X_valid
    • 훈련 데이터셋과 검증데이터셋 비율은 80:20
    • random_state: 42
    • Scikit-learn의 train_test_split 함수를 활용하세요.
  • 스케일링 수행
    • sklearn.preprocessing의 MinMaxScaler 함수 사용
    • 훈련데이터셋의 Feature는 MinMaxScaler의 fit_transform 함수를 활용하여 X_train 변수로 할당
    • 검증데이터셋의 Feature는 MinMaxScaler의 transform 함수를 활용하여 X_valid 변수로 할당
# 여기에 답안코드를 작성하세요.
from sklearn.model_selection import train_test_split

target = 'label'

x = df.drop(target, axis=1)
y = df[target]

X_train, X_valid, y_train, y_valid = train_test_split(x,y, test_size=0.2, random_state=42)
print(X_train.shape, X_valid.shape, y_train.shape, y_valid.shape)
(2584, 21) (647, 21) (2584,) (647,)
from sklearn.preprocessing import MinMaxScaler, StandardScaler, RobustScaler

scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_valid = scaler.transform(X_valid)

8. Species(품종)을 예측하는 머신러닝 모델을 만들려고 합니다.

아래 가이드에 따라 다음 모델을 만들고 학습을 진행하세요.

  • 가장 성능이 좋은 모델 이름을 답안8 변수에 저장하세요
    • 예. 답안8 = 'KNeighborsClassifier' 혹은 'DecisionTreeClassifier' 혹은 'LogisticRegression' 혹은 'RandomForestClassifier' 등등
# 1단계: 불러오기
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier

from sklearn.metrics import confusion_matrix, classification_report, roc_auc_score
models = {
    "KNeighborsClassifier": {"model":KNeighborsClassifier(n_neighbors=5)},
    "DecisionTreeClassifier": {"model":DecisionTreeClassifier()},
    "LogisticRegression": {"model":LogisticRegression()},
    "RandomForestClassifier": {"model":RandomForestClassifier()},
    "XGBClassifier": {"model":XGBClassifier()},
    "LGBMClassifier": {"model":LGBMClassifier(verbose=-1)}
}
from time import perf_counter

# Train모델 학습
for name, model in models.items():
    model = model['model']
    start = perf_counter()

    history = model.fit(X_train, y_train)

    # 학습시간과 val_accuracy 저장
    duration = perf_counter() - start
    duration = round(duration,2)
    models[name]['perf'] = duration

    y_train_pred = model.predict(X_train)
    y_val_pred = model.predict(X_valid)

    train_score = round(model.score(X_train, y_train),4)
    val_score = round(model.score(X_valid, y_valid),4)

    models[name]['train_score'] = train_score
    models[name]['val_score'] = val_score

    print(f"{name:20} trained in {duration} sec, train_score: {train_score}. val_score: {val_score}")

# Create a DataFrame with the results
models_result = []

for name, v in models.items():
    models_result.append([ name, models[name]['val_score'], 
                          models[name]['perf']])

df_results = pd.DataFrame(models_result, 
                          columns = ['model','val_score','Training time (sec)'])
df_results.sort_values(by='val_score', ascending=False, inplace=True)
df_results.reset_index(inplace=True,drop=True)
df_results
KNeighborsClassifier trained in 0.0 sec, train_score: 0.8967. val_score: 0.8454
DecisionTreeClassifier trained in 0.02 sec, train_score: 1.0. val_score: 0.9119
LogisticRegression   trained in 0.01 sec, train_score: 0.7721. val_score: 0.7713
RandomForestClassifier trained in 0.31 sec, train_score: 1.0. val_score: 0.9629
XGBClassifier        trained in 0.07 sec, train_score: 1.0. val_score: 0.9567
LGBMClassifier       trained in 0.05 sec, train_score: 1.0. val_score: 0.9583
  model val_score Training time (sec)
0 RandomForestClassifier 0.9629 0.31
1 LGBMClassifier 0.9583 0.05
2 XGBClassifier 0.9567 0.07
3 DecisionTreeClassifier 0.9119 0.02
4 KNeighborsClassifier 0.8454 0.00
5 LogisticRegression 0.7713 0.01
def check_performance_for_model(df_results):
    plt.figure(figsize = (15,5))
    sns.barplot(x = 'model', y = 'val_score', data = df_results)
    plt.title('ACC (%) on the Test set', fontsize = 15)
    plt.ylim(0,1.2)
    plt.xticks(rotation=90)
    plt.show()

check_performance_for_model(df_results)

답안8='RandomForestClassifier'
from sklearn.metrics import roc_auc_score, accuracy_score

model = RandomForestClassifier()

# 3단계: 학습하기
model.fit(X_train, y_train)

print('train set :',model.score(X_train, y_train))
print('val set :',model.score(X_valid, y_valid))

# 4단계: 예측하기
y_pred = model.predict(X_valid)

# 5단계 평가하기
print(classification_report(y_valid, y_pred))
print('Acc Score :', accuracy_score(y_valid, y_pred))
train set : 1.0
val set : 0.955177743431221
              precision    recall  f1-score   support

           0       0.95      0.96      0.96       321
           1       0.96      0.95      0.96       326

    accuracy                           0.96       647
   macro avg       0.96      0.96      0.96       647
weighted avg       0.96      0.96      0.96       647

Acc Score : 0.955177743431221
# 데이터프레임 만들기
perf_dic = {'feature':list(x),
            'importance': model.feature_importances_}
df_imp = pd.DataFrame(perf_dic)
df_imp.sort_values(by='importance', ascending=True, inplace=True)

# 시각화
plt.figure(figsize=(5, 5))
plt.barh(df_imp['feature'], df_imp['importance'])
plt.show()

9. Species(품종)을 예측하는 딥러닝 모델을 만들려고 합니다.

아래 가이드에 따라 모델링하고 학습을 진행하세요.

  • Tensoflow framework를 사용하여 딥러닝 모델을 만드세요.
  • 히든레이어(hidden layer) 2개이상으로 모델을 구성하세요.
  • 손실함수는 sparse_categorical_crossentropy 사용하세요.
  • 하이퍼파라미터 epochs: 100, batch_size: 16으로 설정해주세요.
  • 각 에포크마다 loss와 metrics 평가하기 위한 데이터로 X_valid, y_valid 사용하세요.
  • 학습정보는 history 변수에 저장해주세요
import tensorflow as tf
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.layers import Dense, Activation, Dropout, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.utils import to_categorical
from keras.optimizers import Adam
# 규제를 위해 필요한 함수 불러오기
from tensorflow.keras.regularizers import l1, l2

tf.random.set_seed(1)
nfeatures = X_train.shape[1]

# Sequential 모델 만들기
model = Sequential()

model.add(Dense(32, activation='relu', input_shape=(nfeatures,), kernel_regularizer = l1(0.01)))
model.add(Dense(16, activation='relu', kernel_regularizer = l1(0.01)))
model.add(Dense(1, activation= 'sigmoid'))

model.compile(optimizer='adam', loss='binary_crossentropy',metrics=['acc'])
# es = EarlyStopping(monitor='val_loss', patience=4, mode='min', verbose=1)    # val_loss

history = model.fit(X_train, y_train, 
                    batch_size=16, 
                    epochs=100, 
                    # callbacks=[es],
                    validation_data=(X_valid, y_valid), 
                    verbose=1).history
Epoch 1/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - acc: 0.4866 - loss: 2.4341 - val_acc: 0.6491 - val_loss: 1.3854
Epoch 2/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 827us/step - acc: 0.5281 - loss: 1.1595 - val_acc: 0.4807 - val_loss: 0.7271
Epoch 3/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 833us/step - acc: 0.5044 - loss: 0.7081 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 4/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 868us/step - acc: 0.4845 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 5/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 832us/step - acc: 0.4777 - loss: 0.6948 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 6/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 826us/step - acc: 0.4756 - loss: 0.6948 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 7/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 837us/step - acc: 0.4814 - loss: 0.6948 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 8/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 836us/step - acc: 0.4814 - loss: 0.6948 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 9/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 849us/step - acc: 0.4814 - loss: 0.6948 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 10/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 821us/step - acc: 0.4814 - loss: 0.6948 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 11/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 841us/step - acc: 0.4814 - loss: 0.6948 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 12/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 825us/step - acc: 0.4814 - loss: 0.6948 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 13/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 820us/step - acc: 0.4814 - loss: 0.6948 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 14/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 819us/step - acc: 0.4814 - loss: 0.6948 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 15/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 812us/step - acc: 0.4814 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 16/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 818us/step - acc: 0.4814 - loss: 0.6948 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 17/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.4814 - loss: 0.6948 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 18/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 847us/step - acc: 0.4814 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 19/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 815us/step - acc: 0.4814 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 20/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 818us/step - acc: 0.4814 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 21/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 819us/step - acc: 0.4814 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 22/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 821us/step - acc: 0.4814 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 23/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 822us/step - acc: 0.4814 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 24/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 813us/step - acc: 0.4814 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 25/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 823us/step - acc: 0.4778 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 26/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 807us/step - acc: 0.4778 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 27/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 811us/step - acc: 0.4778 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 28/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 821us/step - acc: 0.4778 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 29/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 814us/step - acc: 0.4778 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 30/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 815us/step - acc: 0.4778 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 31/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 844us/step - acc: 0.4778 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 32/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 831us/step - acc: 0.4778 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 33/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 812us/step - acc: 0.4778 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 34/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 836us/step - acc: 0.4596 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 35/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 821us/step - acc: 0.4644 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 36/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 833us/step - acc: 0.4632 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 37/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 982us/step - acc: 0.4644 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 38/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.4644 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 39/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 815us/step - acc: 0.4610 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 40/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 821us/step - acc: 0.4610 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 41/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 822us/step - acc: 0.4619 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 42/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 826us/step - acc: 0.4640 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 43/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 830us/step - acc: 0.4601 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 44/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 820us/step - acc: 0.4633 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 45/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 820us/step - acc: 0.4633 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 46/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 812us/step - acc: 0.4633 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 47/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 821us/step - acc: 0.4633 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 48/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 818us/step - acc: 0.4633 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 49/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 827us/step - acc: 0.4633 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 50/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 818us/step - acc: 0.4646 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 51/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 822us/step - acc: 0.4714 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 52/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 824us/step - acc: 0.4724 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 53/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 822us/step - acc: 0.4724 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 54/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 824us/step - acc: 0.4703 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 55/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 822us/step - acc: 0.4719 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 56/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 832us/step - acc: 0.4668 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 57/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - acc: 0.4652 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 58/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 825us/step - acc: 0.4708 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 59/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 821us/step - acc: 0.4708 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 60/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 827us/step - acc: 0.4717 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 61/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 826us/step - acc: 0.4717 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 62/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 826us/step - acc: 0.4708 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 63/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 810us/step - acc: 0.4708 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 64/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 812us/step - acc: 0.4708 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 65/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 822us/step - acc: 0.4652 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 66/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 824us/step - acc: 0.4652 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 67/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 823us/step - acc: 0.4652 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 68/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 825us/step - acc: 0.4652 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 69/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 815us/step - acc: 0.4668 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 70/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 843us/step - acc: 0.4668 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 71/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 823us/step - acc: 0.4668 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 72/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 818us/step - acc: 0.4668 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 73/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 807us/step - acc: 0.4668 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 74/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 812us/step - acc: 0.4668 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 75/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 832us/step - acc: 0.4668 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 76/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 825us/step - acc: 0.4677 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 77/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 896us/step - acc: 0.4734 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 78/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 817us/step - acc: 0.4764 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 79/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 815us/step - acc: 0.4764 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 80/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 826us/step - acc: 0.4764 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 81/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 824us/step - acc: 0.4764 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 82/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 843us/step - acc: 0.4764 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 83/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 824us/step - acc: 0.4822 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 84/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 815us/step - acc: 0.4822 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 85/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 818us/step - acc: 0.4822 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 86/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 825us/step - acc: 0.4822 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 87/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 823us/step - acc: 0.4822 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 88/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 839us/step - acc: 0.4822 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 89/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 821us/step - acc: 0.4822 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 90/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 812us/step - acc: 0.4831 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 91/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 832us/step - acc: 0.4831 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 92/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 824us/step - acc: 0.4836 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 93/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 809us/step - acc: 0.4836 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 94/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 813us/step - acc: 0.4836 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 95/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 820us/step - acc: 0.4836 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 96/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 964us/step - acc: 0.4836 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 97/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 802us/step - acc: 0.4836 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 98/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 821us/step - acc: 0.4845 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 99/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 824us/step - acc: 0.4845 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 100/100
162/162 ━━━━━━━━━━━━━━━━━━━━ 0s 915us/step - acc: 0.4845 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6945
# 함수로 만들어서 사용합시다.
def dl_history_plot(history):
    plt.figure(figsize=(16,4))
    plt.subplot(1,2,1)
    plt.plot(history['loss'], label='loss', marker = '.')
    plt.plot(history['val_loss'], label='val_loss', marker = '.')
    plt.ylabel('Loss')
    plt.xlabel('Epochs')
    plt.legend()
    plt.grid()

    plt.subplot(1,2,2)
    plt.plot(history['acc'], label='acc', marker = '.')
    plt.plot(history['val_acc'], label='val_acc', marker = '.')
    plt.ylabel('ACC')
    plt.xlabel('Epochs')
    plt.legend()
    plt.grid()


    plt.show()

dl_history_plot(history)

import numpy as np 
from sklearn.metrics import roc_auc_score, accuracy_score

pred = model.predict(X_valid)


print(classification_report(y_valid, y_pred))
print('Acc Score :', accuracy_score(y_valid, y_pred))
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step 
              precision    recall  f1-score   support

           0       0.95      0.96      0.96       321
           1       0.96      0.95      0.96       326

    accuracy                           0.96       647
   macro avg       0.96      0.96      0.96       647
weighted avg       0.96      0.96      0.96       647

Acc Score : 0.955177743431221
# 모듈 불러오기
from sklearn.metrics import precision_score

# 성능 평가
print('정밀도(Precision):', precision_score(y_valid, y_pred, average=None))
print('정밀도(Precision):', precision_score(y_valid, y_pred, average='macro'))
print('정밀도(Precision):', precision_score(y_valid, y_pred, average='weighted'))
정밀도(Precision): [0.95061728 0.95975232]
정밀도(Precision): 0.9551848029660207
정밀도(Precision): 0.9552201006400192
# 모듈 불러오기
from sklearn.metrics import recall_score

# 성능 평가
print('각각 0과 1에 대한 재현율(Recall):', recall_score(y_valid, y_pred, average=None))
각각 0과 1에 대한 재현율(Recall): [0.95950156 0.95092025]
# 모듈 불러오기
from sklearn.metrics import f1_score

# 성능 평가
print('각각 0과 1에 대한 F1 score:', f1_score(y_valid, y_pred, average=None))
각각 0과 1에 대한 F1 score: [0.95503876 0.95531587]

'Machine Learning' 카테고리의 다른 글

4.시계열_1_따릉이  (0) 2024.06.02
3.다중분류문제_1_아이리스  (0) 2024.06.02
2.이진분류문제_2_대학진학  (0) 2024.06.02
2.이진분류문제_1_타이타닉  (0) 2024.06.02
1.회귀문제_4_네비게이션  (0) 2024.06.02
Contents

포스팅 주소를 복사했습니다

이 글이 도움이 되었다면 공감 부탁드립니다.