딥러닝

Self - Supervised Learning : Contrastive Learning

Contrastive Learning 이 이전의 Pretext Task 와 다른 점은 다음과 같습니다. Contrastive Learning 은 특정한 일을 시키면서 Representation 을 학습시키는 게 아니라, invariance 와 contrast 를 사용합니다.

비슷하게 생기거나, semantic 하게 유사한 애들을 positive pair 라고 하고, 그렇지 않은 애들을 negative pair 라고 하면, invariance 와 contrast 는 다음과 같이 정의됩니다.

Invariance : Representations of related samples should be similar
Contrast : Representations of unrelated samples should be dissimilar

그렇다면 우리는 이제 애들이 positive 인지 negative 인지 어떻게 정의를 할 수 있을까요 ?

How to contruct positive / negative pairs in the unsupervised setting ?

Similar data (e.g Clustering) : 일단 돌려보고 , 비슷한 애들끼리 모아 보자.
Same data with different augmentation : 같은 데이터에 데이터 augmentation 을 다르게 줘 보자.
Same data with different modality : (video - audio), (image - caption), (rgb - depth) 등등
Utilize sequential structure : 시간 상 가까이 있는 프레임들

그래서 Contrastive Learning 은 다음과 같은 수식이 성립하도록 학습을 진행하는 것입니다.

Invariance-based approach

Clustering & pseudo-labeling
- 일단은 묶어 놓고, 묶어놓은 애들한테 Pseudo-labeling을 한 후 다시 학습을 하자.
Consistency regularization
- 비슷한 샘플들만 잘 솎아내서 쓰자.
Contrastive learning
- 비슷한 애들과 안 비슷한 애들을 다 쓰자.

여기서 많이 쓰이는 것이 InfoNCE loss 입니다.

-log(x) 는 1일 때가 minimum 이고 0으로 갈 수록 커집니다. 따라서 이 수식에서 loss 가 0 이 되려면, s(f(x) ,f(x-)) 부분이 0이 되면 loss가 0에 가까워지겠죠 ?

비슷한 애들끼리 similarity 가 크고, 다른 애들이 similarity 가 작아지면 L 이 커지게 됩니다.

Clustering & Pseudo - Labeling

Deep Cluster, SwAV, DINO

Cluster data into K groups, and assume they are pseudo - labels
Distill Pseudo - labels to the self-supervised classifier

Deep Clustering

Caron et al, “Deep Clustering for Unsupervised Learning of Visual Features”, ECCV 2018

Randomly initialize a CNN
Run many images through CNN, get their final layer features
Cluster the features with K-means ; record cluster for each feature
Use cluster assignments as pseudo labels for each image ; train the CNN to predict cluster assignments

이 방법은 Feature 를 뽑고 이들을 Clustering 합니다. 그리고 이렇게 만들어진 Cluster 들을 Ground Truth 로 사용해서 학습을 진행합니다. 이게 고양이다, 개다 하는 것이 아니라 모여있는 애들에 대해서 대표값을 주는 것입니다.

Instance Discrimination (NPID)

Wu, Zhirong, et al. "Unsupervised feature learning via non-parametric instance discrimination." CVPR 2018

우리가 일반적으로 ImageNet을 classification한다고 가정하면, 마지막에 나오는 logit, 즉 softmax 값이 이런 형태로 나오게 됩니다. 이후 이 값을 one-hot encoding해서 최종적인 결과가 나옵니다.

그런데, 마지막에 나오는 확률 분포를 보니, 비슷한 애들은 값의 높고 semantic 하게 관계가 없는 애들은 낮다는 것입니다.

우리는 Classification을 할 때 ground truth를 주는데, 이것이 얼마나 '강제적'인지 생각해 볼 필요가 있습니다. 만약 leopard = [1 0 0], Bookcase = [0 1 0] ... 이렇게 정의하면 embedding space에서는 오른쪽 그림과 같이 구성됩니다.

그런데 사실, leopard와 jaguar는 leopard와 bookcase보다는 서로 가깝지 않을까요?

이 논문의 가장 중요한 contribution은 non-parametric discrimination을 쓴다는 것입니다.

기존의 parametric 방법은 Weight 를 이용해서 Classify를 했다면, non-parametric 방법은 네트워크를 통과한 feature 를 이용합니다.

*memory bank 부분

SimCLR (Same data with difference augmentation)

Chen, Ting, et al. "A simple framework for contrastive learning of visual representations." International conference on machine learning. PMLR, 2020

SimCLR 은 Positive / Negative 를 Augmentation 을 통해서 만들어냅니다. 즉, 내가 내 거에서 만들어내면 Positive 이고, 내가 아닌 애들은 Negative 입니다.

위 내용은 경희대학교 소프트웨어융합학과 황효석 교수님의 2023년 <심층신경망을 이용한 로봇 인지> 수업 내용을 요약한 것입니다.

저작자표시 (새창열림)

'딥러닝' 카테고리의 다른 글

Contrastive Learning : BYOL (Bootstrap Your Own Latent) (2)	2023.11.20
Contrastive Learning : Moco (Momentum Contrast) (0)	2023.11.16
Self - Supervised Learning : Pretext Task (0)	2023.11.09
다양한 Knowledge Distillation 방법들 : 1. Response - based KD (2)	2023.10.16
Knowledge Distillation 개요 (0)	2023.10.16

Contents

새소식

인기 검색어