×
1 Choose EITC/EITCA Certificates
2 Learn and take online exams
3 Get your IT skills certified

Confirm your IT skills and competencies under the European IT Certification framework from anywhere in the world fully online.

EITCA Academy

Digital skills attestation standard by the European IT Certification Institute aiming to support Digital Society development

SIGN IN YOUR ACCOUNT TO HAVE ACCESS TO DIFFERENT FEATURES

CREATE AN ACCOUNT FORGOT YOUR PASSWORD?

FORGOT YOUR DETAILS?

AAH, WAIT, I REMEMBER NOW!

CREATE ACCOUNT

ALREADY HAVE AN ACCOUNT?
EUROPEAN INFORMATION TECHNOLOGIES CERTIFICATION ACADEMY - ATTESTING YOUR PROFESSIONAL DIGITAL SKILLS
  • SIGN UP
  • LOGIN
  • SUPPORT

EITCA Academy

EITCA Academy

The European Information Technologies Certification Institute - EITCI ASBL

Certification Provider

EITCI Institute ASBL

Brussels, European Union

Governing European IT Certification (EITC) framework in support of the IT professionalism and Digital Society

  • CERTIFICATES
    • EITCA ACADEMIES
      • EITCA ACADEMIES CATALOGUE<
      • EITCA/CG COMPUTER GRAPHICS
      • EITCA/IS INFORMATION SECURITY
      • EITCA/BI BUSINESS INFORMATION
      • EITCA/KC KEY COMPETENCIES
      • EITCA/EG E-GOVERNMENT
      • EITCA/WD WEB DEVELOPMENT
      • EITCA/AI ARTIFICIAL INTELLIGENCE
    • EITC CERTIFICATES
      • EITC CERTIFICATES CATALOGUE<
      • COMPUTER GRAPHICS CERTIFICATES
      • WEB DESIGN CERTIFICATES
      • 3D DESIGN CERTIFICATES
      • OFFICE IT CERTIFICATES
      • BITCOIN BLOCKCHAIN CERTIFICATE
      • WORDPRESS CERTIFICATE
      • CLOUD PLATFORM CERTIFICATENEW
    • EITC CERTIFICATES
      • INTERNET CERTIFICATES
      • CRYPTOGRAPHY CERTIFICATES
      • BUSINESS IT CERTIFICATES
      • TELEWORK CERTIFICATES
      • PROGRAMMING CERTIFICATES
      • DIGITAL PORTRAIT CERTIFICATE
      • WEB DEVELOPMENT CERTIFICATES
      • DEEP LEARNING CERTIFICATESNEW
    • CERTIFICATES FOR
      • EU PUBLIC ADMINISTRATION
      • TEACHERS AND EDUCATORS
      • IT SECURITY PROFESSIONALS
      • GRAPHICS DESIGNERS & ARTISTS
      • BUSINESSMEN AND MANAGERS
      • BLOCKCHAIN DEVELOPERS
      • WEB DEVELOPERS
      • CLOUD AI EXPERTSNEW
  • FEATURED
  • SUBSIDY
  • HOW IT WORKS
  •   IT ID
  • ABOUT
  • CONTACT
  • MY ORDER
    Your current order is empty.
EITCIINSTITUTE
CERTIFIED

What is the mathematical formula of the convolution operation on a 2D image?

by EITCA Academy / Thursday, 23 May 2024 / Published in Artificial Intelligence, EITC/AI/ADL Advanced Deep Learning, Advanced computer vision, Convolutional neural networks for image recognition

The convolution operation is a fundamental process in the realm of convolutional neural networks (CNNs), particularly in the domain of image recognition. This operation is pivotal in extracting features from images, allowing deep learning models to understand and interpret visual data. The mathematical formulation of the convolution operation on a 2D image is essential for grasping how CNNs process and analyze images.

Mathematically, the convolution operation for a 2D image can be expressed as follows:

[ (I * K)(x, y) = sum_{i=-m}^{m} sum_{j=-n}^{n} I(x+i, y+j) cdot K(i, j) ]

Where:
– ( I ) represents the input image.
– ( K ) denotes the kernel or filter.
– ( (x, y) ) are the coordinates of the output pixel.
– ( m ) and ( n ) are the half-width and half-height of the kernel, respectively.

In this equation, the kernel ( K ) slides over the input image ( I ), performing element-wise multiplication and summing the results to produce a single output pixel value. This process is repeated for each pixel in the output feature map, resulting in a transformed image that highlights specific features based on the kernel's values.

The convolution operation can be better understood through a step-by-step example. Consider a simple 3×3 kernel ( K ) and a 5×5 input image ( I ):

[ K = begin{bmatrix}
1 & 0 & -1 \
1 & 0 & -1 \
1 & 0 & -1
end{bmatrix} ] [ I = begin{bmatrix}
1 & 2 & 3 & 4 & 5 \
6 & 7 & 8 & 9 & 10 \
11 & 12 & 13 & 14 & 15 \
16 & 17 & 18 & 19 & 20 \
21 & 22 & 23 & 24 & 25
end{bmatrix} ]

To compute the convolution, we place the center of the kernel at each pixel of the input image and perform the following steps:

1. Position the kernel: Place the center of the kernel at the top-left corner of the image.
2. Element-wise multiplication: Multiply each element of the kernel by the corresponding element of the image.
3. Summation: Sum the results of the element-wise multiplication.
4. Move the kernel: Shift the kernel to the next position and repeat steps 2-3.

For the first position (top-left corner), the calculation is as follows:

[ begin{aligned}
(I * K)(1, 1) &= (1 cdot 1) + (2 cdot 0) + (3 cdot -1) \
&quad + (6 cdot 1) + (7 cdot 0) + (8 cdot -1) \
&quad + (11 cdot 1) + (12 cdot 0) + (13 cdot -1) \
&= 1 + 0 – 3 + 6 + 0 – 8 + 11 + 0 – 13 \
&= -6
end{aligned} ]

This result, -6, is the value of the output feature map at position (1, 1). Repeating this process for each position of the kernel over the input image generates the entire output feature map.

The convolution operation is typically accompanied by additional concepts such as padding and stride:

– Padding: Adding extra pixels around the border of the input image, often with zeros (zero-padding), to control the spatial dimensions of the output feature map. Padding ensures that the output feature map has the same dimensions as the input image, preserving spatial information.
– Stride: The step size by which the kernel moves across the input image. A stride of 1 means the kernel moves one pixel at a time, while a stride of 2 means the kernel moves two pixels at a time. Stride affects the spatial dimensions of the output feature map, with larger strides resulting in smaller output dimensions.

The convolution operation's output dimensions can be calculated using the following formula:

[ text{Output Width} = leftlfloor frac{text{Input Width} – text{Kernel Width} + 2 cdot text{Padding}}{text{Stride}} rightrfloor + 1 ] [ text{Output Height} = leftlfloor frac{text{Input Height} – text{Kernel Height} + 2 cdot text{Padding}}{text{Stride}} rightrfloor + 1 ]

These formulas ensure that the spatial dimensions of the output feature map are correctly determined based on the input image dimensions, kernel size, padding, and stride.

In the context of convolutional neural networks, multiple convolutional layers are stacked together, each with its own set of learnable kernels. These layers progressively extract higher-level features from the input image, enabling the network to recognize complex patterns and objects. The kernels in each layer are learned during the training process through backpropagation, optimizing the network's performance on the given task.

Convolutional layers are often followed by activation functions, such as ReLU (Rectified Linear Unit), which introduce non-linearity into the model. This non-linearity allows the network to learn more complex representations. Additionally, pooling layers, such as max pooling or average pooling, are used to reduce the spatial dimensions of the feature maps, making the model more computationally efficient and less prone to overfitting.

A practical example of a convolutional neural network for image recognition is the famous LeNet-5 architecture, designed for handwritten digit recognition. LeNet-5 consists of multiple convolutional and pooling layers, followed by fully connected layers. The convolutional layers extract features from the input images, while the fully connected layers perform the final classification.

To illustrate the convolution operation in the context of LeNet-5, consider the first convolutional layer, which takes a 32×32 input image and applies six 5×5 kernels with a stride of 1 and no padding. The output feature maps have dimensions of 28×28, calculated as follows:

[ text{Output Width} = leftlfloor frac{32 – 5 + 2 cdot 0}{1} rightrfloor + 1 = 28 ] [ text{Output Height} = leftlfloor frac{32 – 5 + 2 cdot 0}{1} rightrfloor + 1 = 28 ]

Each of the six kernels produces a separate 28×28 feature map, capturing different aspects of the input image. These feature maps are then passed through a ReLU activation function and a 2×2 max pooling layer with a stride of 2, resulting in 14×14 feature maps.

The subsequent layers in LeNet-5 continue to apply convolution and pooling operations, progressively reducing the spatial dimensions while increasing the depth of the feature maps. The final fully connected layers perform the classification based on the extracted features, outputting the predicted digit class.

The convolution operation is a cornerstone of convolutional neural networks, enabling the extraction of meaningful features from images. The mathematical formulation of the convolution operation involves sliding a kernel over the input image, performing element-wise multiplication, and summing the results. Additional concepts such as padding and stride play important roles in controlling the spatial dimensions of the output feature map. Convolutional layers, combined with activation functions and pooling layers, form the building blocks of powerful image recognition models like LeNet-5, capable of recognizing complex patterns and objects in visual data.

Other recent questions and answers regarding Advanced computer vision:

  • What is the formula for an activation function such as Rectified Linear Unit to introduce non-linearity into the model?
  • What is the mathematical formula for the loss function in convolution neural networks?
  • What is the equation for the max pooling?
  • What are the advantages and challenges of using 3D convolutions for action recognition in videos, and how does the Kinetics dataset contribute to this field of research?
  • In the context of optical flow estimation, how does FlowNet utilize an encoder-decoder architecture to process pairs of images, and what role does the Flying Chairs dataset play in training this model?
  • How does the U-NET architecture leverage skip connections to enhance the precision and detail of semantic segmentation outputs, and why are these connections important for backpropagation?
  • What are the key differences between two-stage detectors like Faster R-CNN and one-stage detectors like RetinaNet in terms of training efficiency and handling non-differentiable components?
  • How does the concept of Intersection over Union (IoU) improve the evaluation of object detection models compared to using quadratic loss?
  • How do residual connections in ResNet architectures facilitate the training of very deep neural networks, and what impact did this have on the performance of image recognition models?
  • What were the major innovations introduced by AlexNet in 2012 that significantly advanced the field of convolutional neural networks and image recognition?

View more questions and answers in Advanced computer vision

More questions and answers:

  • Field: Artificial Intelligence
  • Programme: EITC/AI/ADL Advanced Deep Learning (go to the certification programme)
  • Lesson: Advanced computer vision (go to related lesson)
  • Topic: Convolutional neural networks for image recognition (go to related topic)
Tagged under: Artificial Intelligence, CNN, Convolution, Feature Extraction, Image Processing, Kernel
Home » Advanced computer vision / Artificial Intelligence / Convolutional neural networks for image recognition / EITC/AI/ADL Advanced Deep Learning » What is the mathematical formula of the convolution operation on a 2D image?

Certification Center

USER MENU

  • My Account

CERTIFICATE CATEGORY

  • EITC Certification (106)
  • EITCA Certification (9)

What are you looking for?

  • Introduction
  • How it works?
  • EITCA Academies
  • EITCI DSJC Subsidy
  • Full EITC catalogue
  • Your order
  • Featured
  •   IT ID
  • EITCA reviews (Reddit publ.)
  • About
  • Contact
  • Cookie Policy (EU)

EITCA Academy is a part of the European IT Certification framework

The European IT Certification framework has been established in 2008 as a Europe based and vendor independent standard in widely accessible online certification of digital skills and competencies in many areas of professional digital specializations. The EITC framework is governed by the European IT Certification Institute (EITCI), a non-profit certification authority supporting information society growth and bridging the digital skills gap in the EU.

    EITCA Academy Secretary Office

    European IT Certification Institute ASBL
    Brussels, Belgium, European Union

    EITC / EITCA Certification Framework Operator
    Governing European IT Certification Standard
    Access contact form or call +32 25887351

    Follow EITCI on Twitter
    Visit EITCA Academy on Facebook
    Engage with EITCA Academy on LinkedIn
    Check out EITCI and EITCA videos on YouTube

    Funded by the European Union

    Funded by the European Regional Development Fund (ERDF) and the European Social Fund (ESF), governed by the EITCI Institute since 2008

    Information Security Policy | DSRRM and GDPR Policy | Data Protection Policy | Record of Processing Activities | HSE Policy | Anti-Corruption Policy | Modern Slavery Policy

    Automatically translate to your language

    Terms and Conditions | Privacy Policy
    Follow @EITCI
    EITCA Academy

    Your browser doesn't support the HTML5 CANVAS tag.

    • Artificial Intelligence
    • Quantum Information
    • Cybersecurity
    • Web Development
    • Cloud Computing
    • GET SOCIAL
    EITCA Academy


    © 2008-2026  European IT Certification Institute
    Brussels, Belgium, European Union

    TOP
    CHAT WITH SUPPORT
    Do you have any questions?
    We will reply here and by email. Your conversation is tracked with a support token.