an attempt at blogging by: @deeplearnerd
True understanding arises not from external instruction, but from the silent dialogues between curiosity and discovery.
This principle comes alive in Self-Supervised Learning (SSL), where models learn from data without human labels. Bootstrap Your Own Latent (BYOL) takes this further - it learns by comparing different views of the same data, essentially having a dialogue with itself.
I'm assuming there's some familiarity with Supervised and Unsupervised Learning. In short, supervised learning uses labeled data, while unsupervised learning finds patterns without labels. SSL bridges the gap, allowing models to create their own labels from data patterns.
In this post, the flow will be as follows:
Using this blog I shall try to explain how machines, much like curious students, teach themselves.
Self-supervised learning (SSL) comes in two types: contrastive and non-contrastive. Contrastive methods like SimCLR and MoCo learn by comparing an image (say, an otter 🦦) with both similar views of that image and different images (like a pineapple or car). They need these negative examples and large batches to work well. Non-contrastive methods like BYOL and SimSiam learn just by comparing different views of the same image - like an otter photo cropped differently or in different colours.
This blog focuses on non-contrastive learning, specifically BYOL, because it's simpler to use (no negative samples needed), works with less data, and can run efficiently on smaller computers. Plus, its clever design offers valuable insights into how machines can truly learn on their own.
Contrastive methods like SimCLR and MoCo faced two main challenges.
In self-supervised learning (SSL), collapse occurs when the model generates the same output for all inputs, losing the ability to learn meaningful differences. For example, instead of distinguishing between cats and dogs, the model outputs the same feature for both.