https://arxiv.org/abs/2305.16999 Three Towers: Flexible Contrastive Learning with Pretrained Image Models We introduce Three Towers (3T), a flexible method to improve the contrastive learning of vision-language models by incorporating pretrained image classifiers. While contrastive models are usually trained from scratch, LiT (Zhai et al., 2022) has recently s arxiv.org 1. 본 논문에서는 사전에 학습된 이미지 분류..