Machine Learning
2.이진분류문제_3_악성사이트
- -
layout: single
title: "jupyter notebook 변환하기!"
categories: coding
tag: [python, blog, jekyll]
toc: true
author_profile: false
악성사이트 탐지 이진 분류 예측 문제
import warnings
warnings.filterwarnings(action='ignore')
1. scikit-learn 패키지는 머신러닝 교육을 위한 최고의 파이썬 패키지입니다.
scikit-learn를 별칭(alias) sk로 임포트하는 코드를 작성하고 실행하세요.
# 여기에 답안코드를 작성하세요.
import sklearn as sk
2. Pandas는 데이터 분석을 위해 널리 사용되는 파이썬 라이브러리입니다.
Pandas를 사용할 수 있도록 별칭(alias)을 pd로 해서 불러오세요.
# 여기에 답안코드를 작성하세요.
import pandas as pd
3. 모델링을 위해 분석 및 처리할 데이터 파일을 읽어오려고 합니다.
Pandas함수로 데이터 파일을 읽어 데이터프레임 변수명 df에 할당하는 코드를 작성하세요.
path = 'https://raw.githubusercontent.com/khw11044/csv_dataset/master/Malicious_Site_Detection.csv'
df = pd.read_csv(path)
df.head()
url_len | url_num_hyphens_dom | url_path_len | url_domain_len | url_hostname_len | url_num_dots | url_num_underscores | url_query_len | url_num_query_para | url_ip_present | ... | html_num_tags('script') | html_num_tags('embed') | html_num_tags('object') | html_num_tags('div') | html_num_tags('head') | html_num_tags('body') | html_num_tags('form') | html_num_tags('a') | html_num_tags('applet') | label | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 23.0 | 0.0 | 8.0 | 15.0 | 15.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 7.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | malicious |
1 | 75.0 | 0.0 | 58.0 | 17.0 | 17.0 | 6.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 18.0 | 0.0 | 0.0 | 20.0 | 1.0 | 1.0 | 0.0 | 21.0 | 0.0 | benign |
2 | 20.0 | 0.0 | 4.0 | 16.0 | 16.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 33.0 | 0.0 | 0.0 | 101.0 | 1.0 | 1.0 | 3.0 | 70.0 | 0.0 | benign |
3 | 27.0 | 0.0 | 13.0 | 14.0 | 14.0 | 3.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 15.0 | 0.0 | 0.0 | 151.0 | 1.0 | 1.0 | 1.0 | 55.0 | 0.0 | benign |
4 | 39.0 | 2.0 | 12.0 | 27.0 | 27.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 10.0 | 0.0 | 0.0 | 332.0 | 1.0 | 1.0 | 0.0 | 321.0 | 0.0 | benign |
5 rows × 24 columns
데이터 설명
- url_len : URL 길이
- url_num_hypens_dom : URL내 '-'(하이픈) 개수
- url_path_len : URL의 경로 길이
- url_domain_len : URL의 도메인 길이
- url_host_name : URL의 hostname 길이
- url_num_dots : URL내 '.'(닷) 개수
- url_num_underscores : URL내 '_'(언더바) 개수
- url_query_len : URL쿼리 길이
- url_num_query_para : URL쿼리의 파라미터 개수
- url_ip_present : URL내 IP표시 여부
- url_entropy : URL 복잡도
- url_chinese_present : URL내 중국어 표기 여부
- url_port : URL내 포트 표기 여부
- html_num_tags('iframe') : HTML내 'iframe' 태그 개수
- html_num_tags('script') : HTML내 'script' 태그 개수
- html_num_tags('embed') : HTML내 'embed' 태그 개수
- html_num_tags('object') : HTML내 'object' 태그 개수
- html_num_tags('div') : HTML내 'div' 태그 개수
- html_num_tags('head') : HTML내 'head' 태그 개수
- html_num_tags('body') : HTML내 'body' 태그 개수
- html_num_tags('form') : HTML내 'form' 태그 개수
- html_num_tags('a') : HTML내 'a' 태그 개수
- html_num_tags('applet') : HTML내 'applet' 태그 개수
- label : 악성사이트 여부 컬럼 ( 'malicious'는 악성사이트, 'benign'은 정상사이트 )
다음 문항을 풀기 전에 아래 코드를 실행하세요.
# 여기에 답안코드를 작성하세요.
import seaborn as sns
import matplotlib.pyplot as plt
plt.rcParams['font.family'] = 'Malgun Gothic'
4. 각 컬럼들의 분포를 알고 싶습니다
각 컬럼들의 분포도를 그리세요
- 판다스의 hist 함수를 사용하세요
df.hist(bins=10, grid=True, figsize=(20,20))
plt.show()
Q2. 중복 데이터 제거
우리가 접속하는 대부분의 웹사이트는 정상 사이트입니다.
또한, 특정 몇 개 사이트(ex. google, instagram, facebook 등)에 접속 빈도가 높습니다.
편중된 데이터는 모델 학습에 안 좋은 영향을 주기 때문에 중복 데이터 제거를 통해 해결합니다.
이 과정은 데이터 전처리 시 반드시 해야 하는 과정은 아니며, 프로젝트/데이터 성격에 맞게 결정하시면 됩니다.
[문제1] df info()를 통해 데이터를 확인하고 중복된 데이터는 삭제해주세요. 삭제 후 info()를 통해 이전 데이터와 비교해 보세요.
a = df.duplicated()
a.value_counts()
False 3233
True 431
Name: count, dtype: int64
df=df.drop_duplicates()
df.columns
Index(['url_len', 'url_num_hyphens_dom', 'url_path_len', 'url_domain_len',
'url_hostname_len', 'url_num_dots', 'url_num_underscores',
'url_query_len', 'url_num_query_para', 'url_ip_present', 'url_entropy',
'url_chinese_present', 'url_port', 'html_num_tags('iframe')',
'html_num_tags('script')', 'html_num_tags('embed')',
'html_num_tags('object')', 'html_num_tags('div')',
'html_num_tags('head')', 'html_num_tags('body')',
'html_num_tags('form')', 'html_num_tags('a')',
'html_num_tags('applet')', 'label'],
dtype='object')
df['label'].value_counts()
label
benign 1618
malicious 1615
Name: count, dtype: int64
plt.figure(figsize=(6, 4))
df['label'].value_counts().plot(kind='bar')
plt.xlabel('Label')
plt.ylabel('Counts')
plt.show()
Q3. 텍스트와 범주형 특성 처리
기계가 데이터를 인식할 수 있도록 텍스트 데이터를 수치형 데이터로 변경합니다.
- replace() 함수를 이용한 텍스트와 범주형 특성 처리
df['label'].unique()
array(['malicious', 'benign'], dtype=object)
df['label'] = df['label'].astype('category')
df['label'] = df['label'].cat.codes
# 또는
# df.loc[:,['label_binary']] = df['label'].copy()
# df['label_binary'].replace({'benign':0,'malicious':1}, inplace=True)
# df.drop(['label'],axis=1,inplace=True)
df.head()
url_len | url_num_hyphens_dom | url_path_len | url_domain_len | url_hostname_len | url_num_dots | url_num_underscores | url_query_len | url_num_query_para | url_ip_present | ... | html_num_tags('script') | html_num_tags('embed') | html_num_tags('object') | html_num_tags('div') | html_num_tags('head') | html_num_tags('body') | html_num_tags('form') | html_num_tags('a') | html_num_tags('applet') | label | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 23.0 | 0.0 | 8.0 | 15.0 | 15.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 7.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1 |
1 | 75.0 | 0.0 | 58.0 | 17.0 | 17.0 | 6.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 18.0 | 0.0 | 0.0 | 20.0 | 1.0 | 1.0 | 0.0 | 21.0 | 0.0 | 0 |
2 | 20.0 | 0.0 | 4.0 | 16.0 | 16.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 33.0 | 0.0 | 0.0 | 101.0 | 1.0 | 1.0 | 3.0 | 70.0 | 0.0 | 0 |
3 | 27.0 | 0.0 | 13.0 | 14.0 | 14.0 | 3.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 15.0 | 0.0 | 0.0 | 151.0 | 1.0 | 1.0 | 1.0 | 55.0 | 0.0 | 0 |
4 | 39.0 | 2.0 | 12.0 | 27.0 | 27.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 10.0 | 0.0 | 0.0 | 332.0 | 1.0 | 1.0 | 0.0 | 321.0 | 0.0 | 0 |
5 rows × 24 columns
df['label'].unique()
array([1, 0], dtype=int8)
Q4. 결측치 처리
- 데이터 수집 과정에서 발생한 오류 등으로 인해 결측치가 포함된 경우가 많습니다.
- 모델링 전에 결측치를 확인하고 이를 정제하는 과정은 필요합니다.
df.isnull().sum()
url_len 0
url_num_hyphens_dom 0
url_path_len 1
url_domain_len 1
url_hostname_len 0
url_num_dots 0
url_num_underscores 0
url_query_len 0
url_num_query_para 0
url_ip_present 0
url_entropy 0
url_chinese_present 0
url_port 0
html_num_tags('iframe') 0
html_num_tags('script') 0
html_num_tags('embed') 0
html_num_tags('object') 0
html_num_tags('div') 0
html_num_tags('head') 0
html_num_tags('body') 0
html_num_tags('form') 0
html_num_tags('a') 0
html_num_tags('applet') 0
label 0
dtype: int64
df = df.dropna(axis=0)
5. 각 컬럼의 상관관계를 알아 보려고 합니다. 상관관계를 통해 필요없는 컬럼을 제거할 수 있습니다.
상관관계가 높은 순으로 정렬하여 나타내고 다음 컬럼을 제거합니다
불필요한 컬럼은 제거
"url_chinese_present","html_num_tags('applet')"
abs(df.corr()['label']).sort_values(ascending=False)
label 1.000000
url_hostname_len 0.384489
url_domain_len 0.380448
url_num_hyphens_dom 0.355480
html_num_tags('script') 0.202309
url_query_len 0.189689
url_num_query_para 0.184497
url_entropy 0.162198
url_num_underscores 0.133808
html_num_tags('form') 0.116354
html_num_tags('a') 0.113966
url_path_len 0.113835
html_num_tags('embed') 0.111295
html_num_tags('body') 0.110581
html_num_tags('object') 0.105710
url_ip_present 0.076236
html_num_tags('div') 0.061183
url_num_dots 0.047256
html_num_tags('iframe') 0.033966
html_num_tags('head') 0.012990
url_port 0.006642
url_len 0.006429
url_chinese_present NaN
html_num_tags('applet') NaN
Name: label, dtype: float64
df.drop(columns=["url_chinese_present","html_num_tags('applet')"],inplace=True)
abs(df.corr()['label']).sort_values(ascending=False)
label 1.000000
url_hostname_len 0.384489
url_domain_len 0.380448
url_num_hyphens_dom 0.355480
html_num_tags('script') 0.202309
url_query_len 0.189689
url_num_query_para 0.184497
url_entropy 0.162198
url_num_underscores 0.133808
html_num_tags('form') 0.116354
html_num_tags('a') 0.113966
url_path_len 0.113835
html_num_tags('embed') 0.111295
html_num_tags('body') 0.110581
html_num_tags('object') 0.105710
url_ip_present 0.076236
html_num_tags('div') 0.061183
url_num_dots 0.047256
html_num_tags('iframe') 0.033966
html_num_tags('head') 0.012990
url_port 0.006642
url_len 0.006429
Name: label, dtype: float64
7. 훈련과 검증 각각에 사용할 데이터셋을 분리하려고 합니다.
Species(품종) 컬럼을 label값 y로, 나머지 컬럼을 feature값 X로 할당한 후 훈련데이터셋과 검증데이터셋으로 분리하세요.
- 대상 데이터프레임: df_preset
- 훈련과 검증 데이터셋 분리
- 훈련 데이터셋 label: y_train, 훈련 데이터셋 Feature: X_train
- 검증 데이터셋 label: y_valid, 검증 데이터셋 Feature: X_valid
- 훈련 데이터셋과 검증데이터셋 비율은 80:20
- random_state: 42
- Scikit-learn의 train_test_split 함수를 활용하세요.
- 스케일링 수행
- sklearn.preprocessing의 MinMaxScaler 함수 사용
- 훈련데이터셋의 Feature는 MinMaxScaler의 fit_transform 함수를 활용하여 X_train 변수로 할당
- 검증데이터셋의 Feature는 MinMaxScaler의 transform 함수를 활용하여 X_valid 변수로 할당
# 여기에 답안코드를 작성하세요.
from sklearn.model_selection import train_test_split
target = 'label'
x = df.drop(target, axis=1)
y = df[target]
X_train, X_valid, y_train, y_valid = train_test_split(x,y, test_size=0.2, random_state=42)
print(X_train.shape, X_valid.shape, y_train.shape, y_valid.shape)
(2584, 21) (647, 21) (2584,) (647,)
from sklearn.preprocessing import MinMaxScaler, StandardScaler, RobustScaler
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_valid = scaler.transform(X_valid)
8. Species(품종)을 예측하는 머신러닝 모델을 만들려고 합니다.
아래 가이드에 따라 다음 모델을 만들고 학습을 진행하세요.
- 가장 성능이 좋은 모델 이름을 답안8 변수에 저장하세요
- 예. 답안8 = 'KNeighborsClassifier' 혹은 'DecisionTreeClassifier' 혹은 'LogisticRegression' 혹은 'RandomForestClassifier' 등등
# 1단계: 불러오기
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
from sklearn.metrics import confusion_matrix, classification_report, roc_auc_score
models = {
"KNeighborsClassifier": {"model":KNeighborsClassifier(n_neighbors=5)},
"DecisionTreeClassifier": {"model":DecisionTreeClassifier()},
"LogisticRegression": {"model":LogisticRegression()},
"RandomForestClassifier": {"model":RandomForestClassifier()},
"XGBClassifier": {"model":XGBClassifier()},
"LGBMClassifier": {"model":LGBMClassifier(verbose=-1)}
}
from time import perf_counter
# Train모델 학습
for name, model in models.items():
model = model['model']
start = perf_counter()
history = model.fit(X_train, y_train)
# 학습시간과 val_accuracy 저장
duration = perf_counter() - start
duration = round(duration,2)
models[name]['perf'] = duration
y_train_pred = model.predict(X_train)
y_val_pred = model.predict(X_valid)
train_score = round(model.score(X_train, y_train),4)
val_score = round(model.score(X_valid, y_valid),4)
models[name]['train_score'] = train_score
models[name]['val_score'] = val_score
print(f"{name:20} trained in {duration} sec, train_score: {train_score}. val_score: {val_score}")
# Create a DataFrame with the results
models_result = []
for name, v in models.items():
models_result.append([ name, models[name]['val_score'],
models[name]['perf']])
df_results = pd.DataFrame(models_result,
columns = ['model','val_score','Training time (sec)'])
df_results.sort_values(by='val_score', ascending=False, inplace=True)
df_results.reset_index(inplace=True,drop=True)
df_results
KNeighborsClassifier trained in 0.0 sec, train_score: 0.8967. val_score: 0.8454
DecisionTreeClassifier trained in 0.02 sec, train_score: 1.0. val_score: 0.9119
LogisticRegression trained in 0.01 sec, train_score: 0.7721. val_score: 0.7713
RandomForestClassifier trained in 0.31 sec, train_score: 1.0. val_score: 0.9629
XGBClassifier trained in 0.07 sec, train_score: 1.0. val_score: 0.9567
LGBMClassifier trained in 0.05 sec, train_score: 1.0. val_score: 0.9583
model | val_score | Training time (sec) | |
---|---|---|---|
0 | RandomForestClassifier | 0.9629 | 0.31 |
1 | LGBMClassifier | 0.9583 | 0.05 |
2 | XGBClassifier | 0.9567 | 0.07 |
3 | DecisionTreeClassifier | 0.9119 | 0.02 |
4 | KNeighborsClassifier | 0.8454 | 0.00 |
5 | LogisticRegression | 0.7713 | 0.01 |
def check_performance_for_model(df_results):
plt.figure(figsize = (15,5))
sns.barplot(x = 'model', y = 'val_score', data = df_results)
plt.title('ACC (%) on the Test set', fontsize = 15)
plt.ylim(0,1.2)
plt.xticks(rotation=90)
plt.show()
check_performance_for_model(df_results)
답안8='RandomForestClassifier'
from sklearn.metrics import roc_auc_score, accuracy_score
model = RandomForestClassifier()
# 3단계: 학습하기
model.fit(X_train, y_train)
print('train set :',model.score(X_train, y_train))
print('val set :',model.score(X_valid, y_valid))
# 4단계: 예측하기
y_pred = model.predict(X_valid)
# 5단계 평가하기
print(classification_report(y_valid, y_pred))
print('Acc Score :', accuracy_score(y_valid, y_pred))
train set : 1.0
val set : 0.955177743431221
precision recall f1-score support
0 0.95 0.96 0.96 321
1 0.96 0.95 0.96 326
accuracy 0.96 647
macro avg 0.96 0.96 0.96 647
weighted avg 0.96 0.96 0.96 647
Acc Score : 0.955177743431221
# 데이터프레임 만들기
perf_dic = {'feature':list(x),
'importance': model.feature_importances_}
df_imp = pd.DataFrame(perf_dic)
df_imp.sort_values(by='importance', ascending=True, inplace=True)
# 시각화
plt.figure(figsize=(5, 5))
plt.barh(df_imp['feature'], df_imp['importance'])
plt.show()
9. Species(품종)을 예측하는 딥러닝 모델을 만들려고 합니다.
아래 가이드에 따라 모델링하고 학습을 진행하세요.
- Tensoflow framework를 사용하여 딥러닝 모델을 만드세요.
- 히든레이어(hidden layer) 2개이상으로 모델을 구성하세요.
- 손실함수는 sparse_categorical_crossentropy 사용하세요.
- 하이퍼파라미터 epochs: 100, batch_size: 16으로 설정해주세요.
- 각 에포크마다 loss와 metrics 평가하기 위한 데이터로 X_valid, y_valid 사용하세요.
- 학습정보는 history 변수에 저장해주세요
import tensorflow as tf
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.layers import Dense, Activation, Dropout, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.utils import to_categorical
from keras.optimizers import Adam
# 규제를 위해 필요한 함수 불러오기
from tensorflow.keras.regularizers import l1, l2
tf.random.set_seed(1)
nfeatures = X_train.shape[1]
# Sequential 모델 만들기
model = Sequential()
model.add(Dense(32, activation='relu', input_shape=(nfeatures,), kernel_regularizer = l1(0.01)))
model.add(Dense(16, activation='relu', kernel_regularizer = l1(0.01)))
model.add(Dense(1, activation= 'sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy',metrics=['acc'])
# es = EarlyStopping(monitor='val_loss', patience=4, mode='min', verbose=1) # val_loss
history = model.fit(X_train, y_train,
batch_size=16,
epochs=100,
# callbacks=[es],
validation_data=(X_valid, y_valid),
verbose=1).history
Epoch 1/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - acc: 0.4866 - loss: 2.4341 - val_acc: 0.6491 - val_loss: 1.3854
Epoch 2/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 827us/step - acc: 0.5281 - loss: 1.1595 - val_acc: 0.4807 - val_loss: 0.7271
Epoch 3/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 833us/step - acc: 0.5044 - loss: 0.7081 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 4/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 868us/step - acc: 0.4845 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 5/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 832us/step - acc: 0.4777 - loss: 0.6948 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 6/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 826us/step - acc: 0.4756 - loss: 0.6948 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 7/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 837us/step - acc: 0.4814 - loss: 0.6948 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 8/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 836us/step - acc: 0.4814 - loss: 0.6948 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 9/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 849us/step - acc: 0.4814 - loss: 0.6948 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 10/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 821us/step - acc: 0.4814 - loss: 0.6948 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 11/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 841us/step - acc: 0.4814 - loss: 0.6948 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 12/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 825us/step - acc: 0.4814 - loss: 0.6948 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 13/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 820us/step - acc: 0.4814 - loss: 0.6948 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 14/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 819us/step - acc: 0.4814 - loss: 0.6948 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 15/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 812us/step - acc: 0.4814 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 16/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 818us/step - acc: 0.4814 - loss: 0.6948 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 17/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.4814 - loss: 0.6948 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 18/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 847us/step - acc: 0.4814 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 19/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 815us/step - acc: 0.4814 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 20/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 818us/step - acc: 0.4814 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 21/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 819us/step - acc: 0.4814 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 22/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 821us/step - acc: 0.4814 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 23/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 822us/step - acc: 0.4814 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 24/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 813us/step - acc: 0.4814 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 25/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 823us/step - acc: 0.4778 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 26/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 807us/step - acc: 0.4778 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 27/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 811us/step - acc: 0.4778 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 28/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 821us/step - acc: 0.4778 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 29/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 814us/step - acc: 0.4778 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 30/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 815us/step - acc: 0.4778 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 31/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 844us/step - acc: 0.4778 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 32/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 831us/step - acc: 0.4778 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 33/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 812us/step - acc: 0.4778 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 34/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 836us/step - acc: 0.4596 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 35/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 821us/step - acc: 0.4644 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 36/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 833us/step - acc: 0.4632 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 37/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 982us/step - acc: 0.4644 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 38/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.4644 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 39/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 815us/step - acc: 0.4610 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 40/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 821us/step - acc: 0.4610 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 41/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 822us/step - acc: 0.4619 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 42/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 826us/step - acc: 0.4640 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 43/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 830us/step - acc: 0.4601 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 44/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 820us/step - acc: 0.4633 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 45/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 820us/step - acc: 0.4633 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 46/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 812us/step - acc: 0.4633 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 47/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 821us/step - acc: 0.4633 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 48/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 818us/step - acc: 0.4633 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 49/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 827us/step - acc: 0.4633 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 50/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 818us/step - acc: 0.4646 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 51/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 822us/step - acc: 0.4714 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 52/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 824us/step - acc: 0.4724 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 53/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 822us/step - acc: 0.4724 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 54/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 824us/step - acc: 0.4703 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 55/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 822us/step - acc: 0.4719 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 56/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 832us/step - acc: 0.4668 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 57/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - acc: 0.4652 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 58/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 825us/step - acc: 0.4708 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 59/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 821us/step - acc: 0.4708 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 60/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 827us/step - acc: 0.4717 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 61/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 826us/step - acc: 0.4717 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 62/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 826us/step - acc: 0.4708 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 63/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 810us/step - acc: 0.4708 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 64/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 812us/step - acc: 0.4708 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 65/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 822us/step - acc: 0.4652 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 66/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 824us/step - acc: 0.4652 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 67/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 823us/step - acc: 0.4652 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 68/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 825us/step - acc: 0.4652 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 69/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 815us/step - acc: 0.4668 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 70/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 843us/step - acc: 0.4668 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 71/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 823us/step - acc: 0.4668 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 72/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 818us/step - acc: 0.4668 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 73/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 807us/step - acc: 0.4668 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 74/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 812us/step - acc: 0.4668 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 75/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 832us/step - acc: 0.4668 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 76/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 825us/step - acc: 0.4677 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 77/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 896us/step - acc: 0.4734 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 78/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 817us/step - acc: 0.4764 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 79/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 815us/step - acc: 0.4764 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 80/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 826us/step - acc: 0.4764 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 81/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 824us/step - acc: 0.4764 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 82/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 843us/step - acc: 0.4764 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 83/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 824us/step - acc: 0.4822 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 84/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 815us/step - acc: 0.4822 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 85/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 818us/step - acc: 0.4822 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 86/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 825us/step - acc: 0.4822 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 87/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 823us/step - acc: 0.4822 - loss: 0.6947 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 88/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 839us/step - acc: 0.4822 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 89/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 821us/step - acc: 0.4822 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 90/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 812us/step - acc: 0.4831 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 91/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 832us/step - acc: 0.4831 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 92/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 824us/step - acc: 0.4836 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 93/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 809us/step - acc: 0.4836 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 94/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 813us/step - acc: 0.4836 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 95/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 820us/step - acc: 0.4836 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 96/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 964us/step - acc: 0.4836 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 97/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 802us/step - acc: 0.4836 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 98/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 821us/step - acc: 0.4845 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6945
Epoch 99/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 824us/step - acc: 0.4845 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6946
Epoch 100/100
[1m162/162[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 915us/step - acc: 0.4845 - loss: 0.6946 - val_acc: 0.4961 - val_loss: 0.6945
# 함수로 만들어서 사용합시다.
def dl_history_plot(history):
plt.figure(figsize=(16,4))
plt.subplot(1,2,1)
plt.plot(history['loss'], label='loss', marker = '.')
plt.plot(history['val_loss'], label='val_loss', marker = '.')
plt.ylabel('Loss')
plt.xlabel('Epochs')
plt.legend()
plt.grid()
plt.subplot(1,2,2)
plt.plot(history['acc'], label='acc', marker = '.')
plt.plot(history['val_acc'], label='val_acc', marker = '.')
plt.ylabel('ACC')
plt.xlabel('Epochs')
plt.legend()
plt.grid()
plt.show()
dl_history_plot(history)
import numpy as np
from sklearn.metrics import roc_auc_score, accuracy_score
pred = model.predict(X_valid)
print(classification_report(y_valid, y_pred))
print('Acc Score :', accuracy_score(y_valid, y_pred))
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
precision recall f1-score support
0 0.95 0.96 0.96 321
1 0.96 0.95 0.96 326
accuracy 0.96 647
macro avg 0.96 0.96 0.96 647
weighted avg 0.96 0.96 0.96 647
Acc Score : 0.955177743431221
# 모듈 불러오기
from sklearn.metrics import precision_score
# 성능 평가
print('정밀도(Precision):', precision_score(y_valid, y_pred, average=None))
print('정밀도(Precision):', precision_score(y_valid, y_pred, average='macro'))
print('정밀도(Precision):', precision_score(y_valid, y_pred, average='weighted'))
정밀도(Precision): [0.95061728 0.95975232]
정밀도(Precision): 0.9551848029660207
정밀도(Precision): 0.9552201006400192
# 모듈 불러오기
from sklearn.metrics import recall_score
# 성능 평가
print('각각 0과 1에 대한 재현율(Recall):', recall_score(y_valid, y_pred, average=None))
각각 0과 1에 대한 재현율(Recall): [0.95950156 0.95092025]
# 모듈 불러오기
from sklearn.metrics import f1_score
# 성능 평가
print('각각 0과 1에 대한 F1 score:', f1_score(y_valid, y_pred, average=None))
각각 0과 1에 대한 F1 score: [0.95503876 0.95531587]
'Machine Learning' 카테고리의 다른 글
4.시계열_1_따릉이 (0) | 2024.06.02 |
---|---|
3.다중분류문제_1_아이리스 (0) | 2024.06.02 |
2.이진분류문제_2_대학진학 (0) | 2024.06.02 |
2.이진분류문제_1_타이타닉 (0) | 2024.06.02 |
1.회귀문제_4_네비게이션 (0) | 2024.06.02 |
Contents
소중한 공감 감사합니다