
发布者: 柯炜 | 2022-03-07 | 52215

Leverage Your Local and Global Representations: A New Self-Supervised Learning Strategy

Tong Zhang, Congpei Qiu, Wei Ke*, Sabine Süsstrunk, Mathieu Salzmann


Abstract: Most self-supervised representation learning methods aim to learn robust, view-invariant representations by maximizing the similarity between the features extracted from different crops of the same image. While this seems intuitive in the presence of images that depict a single main object, it is ill-suited to handle complex scenes containing multiple objects from different categories; in this case, the representations of two crops that depict different objects should not be similar, and enforcing similarity would simply tend to discard valuable semantic information. In this work, we address this by introducing a new self-supervised learning strategy, LoGo, that explicitly reasons about {\bf Lo}cal and {\bf G}l{\bf o}bal crops. To achieve view invariance, we encourage similarity between the global crops from the same image, as well as between a global and a local crop. However, we allow two local crops to have dissimilar representations, thus correctly encoding the fact that their content may differ entirely. Our extensive experiments on a variety of datasets and using different self-supervised learning frameworks validate the superiority of our approach. Interestingly, we achieve better results than supervised models on transfer learning when using only 1/10 of the data.