[PyTorch] Lab-04 Multivariable Linear regression, Loading Data

여러개의 변수를 가질 때 하나의 값을 예측하는 방법(Multivariate Linear Regression)에 대해서 알아보겠습니다.

Multivariate Linear Regression

> 세 번의 Quiz 점수(x1, x2, x3)를 이용해서 기말고사의 성적을 맞추는 것이 task

> 입력 변수(Quiz 점수)가 3개이기 때문에 weight도 3개입니다.

> 입력변수가 100개가 된다면 weight도 100개가 되어서, hypothesis function이 점점 길어질 것 입니다.

IN[1]

x1_train = torch.FloatTensor([[73],[93],[89],[96],[73]])
x2_train = torch.FloatTensor([[80],[88],[91],[98],[66]])
x3_train = torch.FloatTensor([[75],[93],[90],[100],[70]])
y_train = torch.FloatTensor([[152],[185],[180],[196],[142]])
hypothesis = x1_train * w1 + x2_train * w2 + x3_train * w3 + b

IN[2]

x_train = torch.FloatTensor([[73, 80, 75],
                             [93, 88, 93],
                             [89, 91, 90],
                             [96, 98, 100],
                             [73, 66, 70]])
y_train = torch.FloatTensor([[152],[185],[180],[196],[142]])
hypothesis = x_train.matmul(W) + b

> 입력 변수(Quiz 점수)를 Matrix로 변환하고, Matmul()를 사용하여 hypothesis function을 간단하게 만들어줍니다.

> 변환된 hypotheisis는 입력 변수의 개수가 변동되더라도 코드를 바꿀 필요가 없고, 코드는 더 간결하고, 속도도 더 빠릅니다.

IN[3]

# 모델 초기화
W = torch.zeros((3, 1), requires_grad=True)
b = torch.zeros(1, requires_grad=True)
# optimizer 설정
optimizer = optim.SGD([W, b], lr=1e-5)
nb_epochs = 20
for epoch in range(nb_epochs + 1):
    # H(x) 계산
    hypothesis = x_train.matmul(W) + b
    # cost 계산
    cost = torch.mean((hypothesis - y_train) ** 2)
    # cost로 H(x) 개선
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()
    # 100번마다 로그 출력
    print('Epoch {:4d}/{} hypothesis: {} Cost: {:.6f}'.format(
        epoch, nb_epochs, hypothesis.squeeze().detach(), cost.item()
    ))

> 데이터를 정의하는 부분과 W를 정의하는 부분 이외에 Simple linear regression과 차이가 없습니다.

IN[4]

class MultivariateLinearRegressionModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(3, 1)
    def forward(self, x):
        return self.linear(x)

> 모델이 커지면 W,b을 하나하나 정의하는 것은 번거로운 일이 될 수 있습니다. nn.Module을 상속해서 모델을 생성하면 번거로운 일을 줄일 수 있습니다.

> nn.Linear(3, 1) : 입력 차원, 출력 차원 지정, forward( )에서 hypothesis를 어떻게 계산할 것 인지 지정합니다.

IN[5]

import torch.nn.functional as F
cost = F.mse_loss(prediction, y_train)

> Pytorch에서 제공하는 cost function은 쉽게 다른 loss와 교체가 가능하다.

Minibatch Gradient Descent

> 현실에서는 엄청난 양의 데이터를 가지고 복잡한 머신 러닝 모델을 학습해야 합니다.

> 엄청난 양의 데이터를 학습시키기에는 무리가 있습니다.

> 전체 데이터를 한 번에 학습시키기에는 무리가 있기 때문에 균일하게 나눠서 학습을 합니다.

> 전체 데이터를 한 번에 사용하지 않기에 computation cost가 적고 업데이트 속도가 빠릅니다.

> 잘못된 방향으로 업데이트를 할 수도 있다. cost가 매끄럽게 줄어들지 않는 모습을 관찰할 수 있습니다.

IN[5]

from torch.utils.data import Dataset
class CustomerDataset(Dataset):
    def __init__(self):
        self.x_data = [[73, 80, 75],
                       [93, 88, 93],
                       [89, 91, 90],
                       [96, 98, 100],
                       [73, 66, 70]]
        self.y_data = [[152],[185],[180],[196],[142]]
    def __len__(self):
        return len(self.x_data)
    def __getitem__(self, idx):
        x = torch.FloatTensor(self.x_data[idx])
        y = torch.FloatTensor(self.y_data[idx])
        return x, y
dataset = CustomerDataset()

> __len__( ) : 이 데이터셋의 총 데이터 수

> __getitem__( ) : 어떠한 인덱스 idx를 받았을 때, 그에 상응하는 입출력 데이터 반환

IN[6]

from torch.utils.data import DataLoader
dataloader = DataLoader(dataset, batch_size=2, shuffle=True)

> Batch_size : minibatch의 크기를 지정, 2의 제곱수로 설정(2, 4, 8, 32, 64 ...)

> Shuffle = True : Epoch마다 데이터셋을 섞어서 모델이 dataset의 순서를 외우는 것을 방지한다. (unseen data가 들어왔을 때 좋은 성능을 보이기 위해서)

IN[7]

nb_epochs = 20
for epoch in range(nb_epochs+1):
    for batch_idx, samples in enumerate(dataloader):
        x_train, y_train = samples
        # h(x) 계산
        prediction = model(x_train)
        # cost 계산
        cost = F.mse_loss(prediction, y_train)
        # cost로 h(x) 개선
        optimizer.zero_grad()
        cost.backward()
        optimizer.step()
    # 20번마다 로그 출력
    print('Epoch {:4d}/{} Cost: {:.6f}'.format(
        epoch, nb_epochs, cost.item()
    ))

> enumerate(dataloader) – minibatch의 인덱스와 데이터를 받음

참고링크 :

[PyTorch] Lab-04-1 Multivariable Linear regression
[PyTorch] Lab-04-2 Loading Data

[PyTorch] Lab-06 Softmax Classification (0)	2019.06.04
[PyTorch] Lab-05 Logistic Regression (0)	2019.06.02
[PyTorch] Lab-03 Deeper Look at GD (0)	2019.06.01
[PyTorch] Lab-02 Linear regression (0)	2019.06.01
[PyTorch] Lab-01 Tensor Manipulation (0)	2019.05.31

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

흔적남기기

티스토리 뷰