Standard convolution and lite versions

In school, you may have studied standard convolution in your lectures. Standard convolution is effective for image processing, but it requires a significant number of parameters and heavy computational resources. Recently, several research advances have introduced lighter versions of convolution layers, aiming to reduce these costs.
In this blog, we’ll explore some of these variants and compare the number of parameters required for each type.

1. Standard Convolution

The traditional convolution operation connects every input channel to every output channel.

Formula:

Parameters = (C_in × K × K + 1) × C_out

Where:

C_in: Input channels
K: Kernel size
C_out: Output channels
+1: Bias term

Example: 3×3 convolution, 64 => 128 channels
Parameters = (64 × 3 × 3 + 1) × 128 = 73,856

2. Depthwise Convolution

Depthwise convolution applies a single filter per input channel, with no cross-channel computation.

Formula:

Parameters = (K × K + 1) × C_in

Example: 3×3 Depthwise, 64 channels
Parameters = (3 × 3 + 1) × 64 = 640

3. Pointwise Convolution (1×1 Conv)

Pointwise convolution uses 1×1 kernels to combine information across channels.

Formula:

Parameters = (C_in + 1) × C_out

Example: 1×1 Conv, 64 => 128 channels
Parameters = (64 + 1) × 128 = 8,320

4. MobileNet: Depthwise Separable Convolution

MobileNet combines depthwise and pointwise convolutions sequentially:

Input → 3×3 Depthwise → 1×1 Pointwise

Formula:

Parameters = (K × K + 1) × C_in + (C_in + 1) × C_out

Example: 64 => 128 channels
Depthwise: (3 × 3 + 1) × 64 = 640
Pointwise: (64 + 1) × 128 = 8,320
Total: 8,960 parameters

5. OSNet: Lite Convolution

OSNet's lite convolution uses a 1×1 convolution followed by a depthwise 3×3 convolution:

Input → 1×1 Conv → 3×3 Depthwise

The parameter count depends on the 1×1 conv output channels.

Formula:

Parameters = (C_in + 1) × C_out + (K × K + 1) × C_out

Example: 64 => 128 channels
Conv: (64 + 1) × 128 = 8,320
Depthwise: (3 × 3 + 1) × 128 = 1,280
Total: 9,600 parameters

6. Parameter Comparison

For 64 => 128 Channel Transformation:

Standard Conv 3×3: 73,856 parameters
MobileNet (Depthwise => Pointwise): 8,960 parameters
OSNet (1×1 => Depthwise): 9,600 parameters

For Same Channel Count (64 => 64):
Standard Conv 3×3: 37,056 parameters
MobileNet (Depthwise => Pointwise): 4,800 parameters
OSNet (1×1 => Depthwise): 4,800 parameters

7. Conclusion

Choosing between MobileNet and OSNet depends on your task and resource constraints. MobileNet is ideal when you need the smallest model size and fastest speed, such as for real-time applications or mobile devices, but may lose more accuracy as tasks become complex. OSNet, on the other hand, preserves more representational power thanks to its architecture, often achieving higher accuracy than MobileNet on challenging tasks like person re-identification.

However, both MobileNet and OSNet architectures typically trail standard convolutions in accuracy by a modest margin, but the massive reduction in parameters and compute makes them the superior choice for efficiency-critical scenarios

Ref:

Published at 2025-06-04 17:04:54 +0700

Related blogs

How I built PRIVATE IMAGE TOOLS in just 2 days with Cursor

I've been using AI tools in my development process for a while now, and they've significantly boosted my performance by reducing the knowledge gap. In...

2025-04-11 10:28:14 +0700