by: https://x.com/deeplearnerd
In the previous two blogs, we laid the groundwork for computer vision - from basic image processing to detecting patterns in images. We explored how computers understand edges, identify key features, and recognise patterns using traditional techniques. But now it's time to dive into what's revolutionised computer vision in the past decade. We're moving from manually crafted features to letting machines learn their own representations. In this final blog, we'll explore the world of deep learning in computer vision, covering:
Probably one of the most common blog/tutorial topic you would find over the internet. Convolution Neural Networks are something people really love to write about and use also because in my opinion they are highly intuitive to understand and at the same time are super effective to solve most of your computer vision tasks.
I’ll just link to some amazing resources I came across while researching for this blog (since our blog is more of just brushing the basics, you can check this out to know about the details):
At their core, CNNs are a type of artificial neural network specifically designed for processing structured data like images. Unlike traditional neural networks, CNNs exploit the spatial structure of images, making them highly effective at recognising patterns such as edges, textures, and shapes.
To understand CNNs, it's essential to grasp their primary components and how they interact to process visual information:
The convolutional layer is the foundation of a CNN. It applies a set of learnable filters (or kernels) to the input image or the output from the previous layer. Each filter slides over the input spatially, performing element-wise multiplication and summation to produce a feature map. These feature maps highlight the presence of specific features (like edges or patterns) in different regions of the input.
We have already understood the maths and code behind convolution in the previous blog:
Key Concepts:
After the convolution operation, an activation function like Rectified Linear Unit (ReLU) is applied element-wise to introduce non-linearity into the model. ReLU transforms all negative values to zero while keeping positive values unchanged, enabling the network to learn complex patterns.