[PyTorch] Lab-05 Logistic Regression

범주형 변수를 분류하는 Logistic Regression 모델을 살펴 보겠습니다.

> Logistic regression은 (0, 1)의 binary classification을 예측하는 task입니다.
기존의 Linear regression으로는 binary classification 문제를 해결하기 어렵습니다.

> Linear regression은 연속형 값을 예측하는데 주로 쓰입니다.

> 이러한 문제를 해결하기 위해서 Linear regression 결과 값에 g(z) = sigmoid 함수에 적용시켜 (0, 1)을 잘 예측할 수 있게 만들어줍니다.

> 그러나 sigmoid함수는 곡선모양으로 (0, 1)의 값만을 예측하는 모델이 아닙니다.

> Decision Boundary를 설정하여 (0 , 1)의 값만을 예측할 수 있도록 조정을 해줍니다.

코드를 통해서 과정을 살펴보겠습니다.

IN[1]

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
torch.manual_seed(1)

Computing the hypothesis

IN[2]

W = torch.zeros((2, 1), requires_grad = True)
b = torch.zeros(1, requires_grad = True)
hypothesis = 1 / (1 + torch.exp(-(x_train.matmul(W)+ b)))
hypothesis = torch.sigmoid(x_train.matmul(W) + b)

> logistic regression 함수를 직접 구현하는 것은 복잡하기 때문에 torch에 내장 되어있는 sigmoid 함수를 사용하겠습니다.

Computing the Cost Function

IN[3]

losses = -(y_train * torch.log(hypothesis) + (1 - y_train) * torch.log(1 - hypothesis))
cost = losses.mean()

IN[4]

F.binary_cross_entropy(hypothesis, y_train)

> 마찬가지로 직접 구현은 복잡하니 torch에 있는 cost function을 이용하겠습니다.

Whole Training Procedure

IN[5]

import numpy as np
xy = np.loadtxt('data-03-diabetes.csv', delimiter = ',', dtype=np.float32)
x_data = xy[:,0:-1]
y_data = xy[:, [-1]]
x_train = torch.FloatTensor(x_data)
y_train = torch.FloatTensor(y_data)

> 실제 데이터를 불러와서 학습을 진행합니다.

IN[6]

print(x_train[0:5])
print(y_train[0:5])

OUT[6]

tensor([[-0.2941,  0.4874,  0.1803, -0.2929,  0.0000,  0.0015, -0.5312, -0.0333],
        [-0.8824, -0.1457,  0.0820, -0.4141,  0.0000, -0.2072, -0.7669, -0.6667],
        [-0.0588,  0.8392,  0.0492,  0.0000,  0.0000, -0.3055, -0.4927, -0.6333],
        [-0.8824, -0.1055,  0.0820, -0.5354, -0.7778, -0.1624, -0.9240,  0.0000],
        [ 0.0000,  0.3769, -0.3443, -0.2929, -0.6028,  0.2846,  0.8873, -0.6000]])
tensor([[0.],
        [1.],
        [0.],
        [1.],
        [0.]])

> 불러온 데이터의 형태를 파악합니다.

> X의 변수가 8개, Y의 변수가 1개인 것을 확인합니다.

IN[7]

# 모델 초기화
W = torch.zeros((8, 1), requires_grad=True)
b = torch.zeros(1, requires_grad=True)
# optimizer 설정
optimizer = optim.SGD([W, b], lr=1)
nb_epochs = 1000
for epoch in range(nb_epochs + 1):
    # Cost 계산
    hypothesis = torch.sigmoid(x_train.matmul(W) + b) 
    cost = F.binary_cross_entropy(hypothesis, y_train)
    # cost로 H(x) 개선
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()
    # 100번마다 로그 출력
    if epoch % 100 == 0:
        print('Epoch {:4d}/{} Cost: {:.6f}'.format(
            epoch, nb_epochs, cost.item()
        ))

> W, b를 x와 y의 개수만큼 지정합니다.

Evaluation

IN[8]

hypothesis = torch.sigmoid(x_train.matmul(W) + b)
prediction = hypothesis >= torch.FloatTensor([0.5])
correct_prediction = prediction.float() == y_train

> Logistic Regression은 (0, 1)을 예측하는 binary predictions 문제이기 때문에 threshold값을 정하여 (0, 1)로 나누어주고 예측값이 얼마나 예측 잘 했는지 평가합니다.

IN[9]

accuracy = correct_prediction.sum().item() / len(correct_prediction)
print('The model has an accuracy of {:2.2f}% for the training set.'.format(accuracy * 100))

OUT[9]

The model has an accuracy of 76.94% for the training set.

> 76.94%의 정확도를 보이고 있습니다.

Logitstic Regression with nn.Module

> nn.Module을 이용해 좀 더 간편한 방법으로 Logistic Regression을 시행해보겠습니다.

IN[10]

class BinaryClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(8, 1)
        self.sigmoid = nn.Sigmoid()
    def forward(self, x):
        return self.sigmoid(self.linear(x))

IN[11]

model = BinaryClassifier()

IN[12]

# optimizer 설정
optimizer = optim.SGD(model.parameters(), lr=1)
nb_epochs = 100
for epoch in range(nb_epochs + 1):
    # H(x) 계산
    hypothesis = model(x_train)
    # cost 계산
    cost = F.binary_cross_entropy(hypothesis, y_train)
    # cost로 H(x) 개선
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()
    # 20번마다 로그 출력
    if epoch % 10 == 0:
        prediction = hypothesis >= torch.FloatTensor([0.5])
        correct_prediction = prediction.float() == y_train
        accuracy = correct_prediction.sum().item() / len(correct_prediction)
        print('Epoch {:4d}/{} Cost: {:.6f} Accuracy {:2.2f}%'.format(
            epoch, nb_epochs, cost.item(), accuracy * 100,
        ))

OUT[12]

Epoch    0/100 Cost: 0.704829 Accuracy 45.72%
Epoch   10/100 Cost: 0.572391 Accuracy 67.59%
Epoch   20/100 Cost: 0.539563 Accuracy 73.25%
Epoch   30/100 Cost: 0.520042 Accuracy 75.89%
Epoch   40/100 Cost: 0.507561 Accuracy 76.15%
Epoch   50/100 Cost: 0.499125 Accuracy 76.42%
Epoch   60/100 Cost: 0.493177 Accuracy 77.21%
Epoch   70/100 Cost: 0.488846 Accuracy 76.81%
Epoch   80/100 Cost: 0.485612 Accuracy 76.28%
Epoch   90/100 Cost: 0.483146 Accuracy 76.55%
Epoch  100/100 Cost: 0.481234 Accuracy 76.81%

참고링크:

[PyTorch] Lab-05 Logistic Regression
[TensorFlow] Lab-05-1 Logistic Regression

[PyTorch] Lab-07 Tips, MNIST Introduction (0)	2019.06.04
[PyTorch] Lab-06 Softmax Classification (0)	2019.06.04
[PyTorch] Lab-04 Multivariable Linear regression, Loading Data (0)	2019.06.01
[PyTorch] Lab-03 Deeper Look at GD (0)	2019.06.01
[PyTorch] Lab-02 Linear regression (0)	2019.06.01

흔적남기기

티스토리 뷰