Table of contents
Reading Settings
16px

Standard convolution and lite versions
AI
AI
In school, you may have studied standard convolution in your lectures. Standard convolution is effective for image processing, but it requires a significant number of parameters and heavy computational resources. Recently, several research advances have introduced lighter versions of convolution layers, aiming to reduce these costs.
In this blog, we’ll explore some of these variants and compare the number of parameters required for each type.
In this blog, we’ll explore some of these variants and compare the number of parameters required for each type.
1. Standard Convolution
The traditional convolution operation connects every input channel to every output channel.
Formula: Parameters = (C_in × K × K + 1) × C_out
Where:
C_in : Input channelsK : Kernel sizeC_out : Output channels+1 : Bias term
Example: 3×3 convolution, 64 => 128 channels
Parameters = (64 × 3 × 3 + 1) × 128 = 73,856
2. Depthwise Convolution
Depthwise convolution applies a single filter per input channel, with no cross-channel computation.
Formula:
Example: 3×3 Depthwise, 64 channels
Parameters = (3 × 3 + 1) × 64 = 640
Parameters = (3 × 3 + 1) × 64 = 640
3. Pointwise Convolution (1×1 Conv)
Pointwise convolution uses 1×1 kernels to combine information across channels.
Formula: Parameters = (C_in + 1) × C_out
Example: 1×1 Conv, 64 => 128 channels
Parameters = (64 + 1) × 128 = 8,320
Parameters = (64 + 1) × 128 = 8,320
4. MobileNet: Depthwise Separable Convolution
MobileNet combines depthwise and pointwise convolutions sequentially:
Input → 3×3 Depthwise → 1×1 Pointwise
Formula:
Example: 64 => 128 channels
Depthwise: (3 × 3 + 1) × 64 = 640
Pointwise: (64 + 1) × 128 = 8,320
Total: 8,960 parameters
Depthwise: (3 × 3 + 1) × 64 = 640
Pointwise: (64 + 1) × 128 = 8,320
Total: 8,960 parameters
5. OSNet: Lite Convolution
OSNet's lite convolution uses a 1×1 convolution followed by a depthwise 3×3 convolution:
Input → 1×1 Conv → 3×3 Depthwise
The parameter count depends on the 1×1 conv output channels.
Formula:Parameters = (C_in + 1) × C_out + (K × K + 1) × C_out
Formula:
Example: 64 => 128 channels
Conv: (64 + 1) × 128 = 8,320
Depthwise: (3 × 3 + 1) × 128 = 1,280
Total: 9,600 parameters
Conv: (64 + 1) × 128 = 8,320
Depthwise: (3 × 3 + 1) × 128 = 1,280
Total: 9,600 parameters
6. Parameter Comparison
For 64 => 128 Channel Transformation:
Standard Conv 3×3: 73,856 parameters
MobileNet (Depthwise => Pointwise): 8,960 parameters
OSNet (1×1 => Depthwise): 9,600 parameters
MobileNet (Depthwise => Pointwise): 8,960 parameters
OSNet (1×1 => Depthwise): 9,600 parameters
For Same Channel Count (64 => 64):
Standard Conv 3×3: 37,056 parameters
MobileNet (Depthwise => Pointwise): 4,800 parameters
OSNet (1×1 => Depthwise): 4,800 parameters
7. Conclusion
Choosing between MobileNet and OSNet depends on your task and resource constraints. MobileNet is ideal when you need the smallest model size and fastest speed, such as for real-time applications or mobile devices, but may lose more accuracy as tasks become complex. OSNet, on the other hand, preserves more representational power thanks to its architecture, often achieving higher accuracy than MobileNet on challenging tasks like person re-identification.
However, both MobileNet and OSNet architectures typically trail standard convolutions in accuracy by a modest margin, but the massive reduction in parameters and compute makes them the superior choice for efficiency-critical scenarios
Ref:
Ref:
- https://www.paepper.com/blog/posts/depthwise-separable-convolutions-in-pytorch/
- https://arxiv.org/pdf/1905.00953
- https://arxiv.org/pdf/1704.04861
Published at
2025-06-04 17:04:54 +0700