Table of contents
    Reading Settings
    16px
    blog cover

    Standard convolution and lite versions

    AI
    AI
    In school, you may have studied standard convolution in your lectures. Standard convolution is effective for image processing, but it requires a significant number of parameters and heavy computational resources. Recently, several research advances have introduced lighter versions of convolution layers, aiming to reduce these costs. 
    In this blog, we’ll explore some of these variants and compare the number of parameters required for each type.

    1. Standard Convolution

    The traditional convolution operation connects every input channel to every output channel.

    Formula: Parameters = (C_in × K × K + 1) × C_out

    Where:
    • C_in: Input channels
    • K: Kernel size
    • C_out: Output channels
    • +1: Bias term

    Example:
    3×3 convolution, 64 => 128 channels
    Parameters = (64 × 3 × 3 + 1) × 128 = 73,856

    2. Depthwise Convolution

    Depthwise convolution applies a single filter per input channel, with no cross-channel computation.

    Formula:
    Parameters = (K × K + 1) × C_in

    Example: 3×3 Depthwise, 64 channels
    Parameters = (3 × 3 + 1) × 64 = 640

    3. Pointwise Convolution (1×1 Conv)

    Pointwise convolution uses 1×1 kernels to combine information across channels.

    Formula: Parameters = (C_in + 1) × C_out

    Example: 1×1 Conv, 64 => 128 channels
    Parameters = (64 + 1) × 128 = 8,320

    4. MobileNet: Depthwise Separable Convolution

    MobileNet combines depthwise and pointwise convolutions sequentially:
    Input → 3×3 Depthwise → 1×1 Pointwise

    Formula:
    Parameters = (K × K + 1) × C_in + (C_in + 1) × C_out

    Example: 64 => 128 channels
    Depthwise: (3 × 3 + 1) × 64 = 640
    Pointwise: (64 + 1) × 128 = 8,320
    Total: 8,960 parameters

    5. OSNet: Lite Convolution

    OSNet's lite convolution uses a 1×1 convolution followed by a depthwise 3×3 convolution:
    Input → 1×1 Conv → 3×3 Depthwise

    The parameter count depends on the 1×1 conv output channels.

    Formula: Parameters = (C_in + 1) × C_out + (K × K + 1) × C_out

    Example: 64 => 128 channels
    Conv: (64 + 1) × 128 = 8,320
    Depthwise: (3 × 3 + 1) × 128 = 1,280
    Total: 9,600 parameters

    6. Parameter Comparison

    For 64 => 128 Channel Transformation:
    Standard Conv 3×3: 73,856 parameters
    MobileNet (Depthwise => Pointwise): 8,960 parameters
    OSNet (1×1 => Depthwise): 9,600 parameters

    For Same Channel Count (64 => 64):
    Standard Conv 3×3: 37,056 parameters
    MobileNet (Depthwise => Pointwise): 4,800 parameters
    OSNet (1×1 => Depthwise): 4,800 parameters

    7. Conclusion

    Choosing between MobileNet and OSNet depends on your task and resource constraints. MobileNet is ideal when you need the smallest model size and fastest speed, such as for real-time applications or mobile devices, but may lose more accuracy as tasks become complex. OSNet, on the other hand, preserves more representational power thanks to its architecture, often achieving higher accuracy than MobileNet on challenging tasks like person re-identification.

    However, both MobileNet and OSNet architectures typically trail standard convolutions in accuracy by a modest margin, but the massive reduction in parameters and compute makes them the superior choice for efficiency-critical scenarios

    Ref:


    Published at 2025-06-04 17:04:54 +0700

    Related blogs