深度学习笔记(二十)网络的参数量param 和浮点计算量FLOPs

参考:

1. CNN 模型所需的计算力(flops)和参数(parameters)数量是怎么计算的?

2. TensorFlow 模型浮点数计算量和参数量统计

3. How fast is my model?

计算公式

理论上的计算公式如下:

egin{equation}
label{FLOPs}
egin{split}
& param_{conv} = k_w * k_h * c_{in}) * c_{out} + c_{out} \
& macc_{conv} = k_w * k_h * c_{in}) * c_{out} * H * W \
& FLOPs_{conv} = [2 * k_w * k_h * c_{in}) * c_{out} + c_{out}] * H * W \
& param_{fc} = n_{in} * n_{out}) + n_{out} \
& macc_{fc} = n_{in} * n_{out} \
& FLOPs_{fc} = 2 * n_{in} * n_{out}) + n_{out} \
end{split}
end{equation}

注:以上公式是考虑常规卷积/全连接层操作且有 bias 的情况!

卷积层的参数量和卷积核的大小、输入输出通道数相关;全连接层的参数量则只与输入输出通道数有关。

MACCs:是multiply-accumulate operations,指点积运算, 一个 macc = 2FLOPs

FLOPs 的全称是 floating points of operations,即浮点运算次数,用来衡量模型的计算复杂度。计算 FLOPs 实际上是计算模型中乘法和加法的运算次数。卷积层的浮点运算次数不仅取决于卷积核的大小和输入输出通道数,还取决于特征图的大小;而全连接层的浮点运算次数和参数量是相同的。

特别的,对于 Group Conv:

egin{equation}
label{GC_FLOPs}
egin{split}
& param_{GC} = k_w * k_h * frac{c_{in}}{G}) * c_{out} + c_{out} \
& macc_{GC} = k_w * k_h * frac{c_{in}}{G}) * c_{out} * H * W \
& FLOPs_{GC} = [2 * k_w * k_h * frac{c_{in}}{G}) * c_{out} + c_{out}] * H * W \
end{split}
end{equation}

手动计算

简单起见,这里以 LeNet 为例:

我们这里先手工计算下:

egin{equation}
label{Example}
egin{split}
& param_{conv1} = 5^2 * 1) * 20 + 20 = 520 \
& macc_{conv1} = 5^2 * 1) * 20 * 24 * 24 = 288k \
& FLOPs_{conv1} = 2 * macc_{conv1} + 20 * 24 * 24 = 587.52k \
& \
& FLOPs_{pool1} = 20 * 24 * 24 = 11.52k \
& \
& param_{conv2} = 5^2 * 20) * 50) + 50 = 25.05k \
& macc_{conv1} = 5^2 * 20) * 50 * 8 * 8 = 1.6M \
& FLOPs_{conv2} = 2 * macc_{conv2} + 50 * 8 * 8 = 3203.2k \
& \
& FLOPs_{pool2} = 50 * 8 * 8 = 3.2k \
& \
& param_{ip1} = 50*4*4) * 500 + 500 = 400.5k \
& macc_{ip1} = 50*4*4) * 500 = 400k  \
& FLOPs_{ip1} = 2* macc_{ip1} + 500 = 800.5k \
&\
& param_{ip2} = 500 * 10 + 10 = 5.01k \
& macc_{ip2} = 500 * 10 = 5k \
& FLOPs_{ip2} = 2 * macc_{ip2} + 10 = 10.01k \
end{split}
end{equation}

Caffe

name: "LeNet"

input: "data"
input_shape {
  dim: 1
  dim: 1
  dim: 28
  dim: 28
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 50
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool2"
  top: "ip1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "ip1"
  top: "ip1"
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}

View Code

 我们可以把网络用 Netscope 工具打开,直接得到结果:

TensorFlow

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

# ================================================================ #
#                        Train a Sample Model                      #
# ================================================================ #
# 1. create data
mnist = input_data.read_data_sets'../MNIST', one_hot=True)
with tf.variable_scope'Input'):
    tf_x = tf.placeholderdtype=tf.float32, shape=[None, 28 * 28], name='x')
    image = tf.reshapetf_x, [-1, 28, 28, 1], name='image')
    tf_y = tf.placeholderdtype=tf.float32, shape=[None, 10], name='y')
    is_training = tf.placeholderdtype=tf.bool, shape=None)

# 2. define Network
with tf.variable_scope'Net'):
    """
    "SAME" 类型的padding:
    out_height = ceilin_height / strides[1]); ceil向上取整
    out_width = ceilin_width / strides[2])

    "VALID"类型的padding:
    out_height = ceilin_height - filter_height + 1) / striders[1])
    out_width = ceilin_width - filter_width + 1) / striders[2]
    """
    conv1 = tf.layers.conv2dinputs=image, filters=20, kernel_size=5,
                             strides=1, padding='valid', activation=None, name='conv1')  # 20x24x24
    pool1 = tf.layers.max_pooling2dinputs=conv1, pool_size=2, strides=2, name='pool1')  # 20x12x12
    conv2 = tf.layers.conv2dpool1, 50, 5, 1, 'valid', activation=None, name='conv2')  # 50x8x8
    pool2 = tf.layers.max_pooling2dconv2, 2, 2, name='pool2')  # 50x4x4
    pool2_flat = tf.reshapepool2, [-1, 4 * 4 * 50])
    fc1 = tf.layers.densepool2_flat, 500, tf.nn.relu, name='ip1')  # 500
    predict = tf.layers.densefc1, 10, name='ip2')  # 10

# 3. define loss & accuracy
with tf.name_scope'loss'):
    loss = tf.losses.softmax_cross_entropyonehot_labels=tf_y, logits=predict, label_smoothing=0.01)
with tf.name_scope'accuracy'):
    # tf.metrics.accuracy) 返回 累计[上次的平均accuracy, 这次的平均accuracy]
    accuracy = tf.metrics.accuracylabels=tf.argmaxtf_y, axis=1), predictions=tf.argmaxpredict, axis=1))[1]

# 4. define optimizer
with tf.name_scope'train'):
    optimizer_op = tf.train.AdamOptimizer1e-4).minimizeloss)

# 5. initialize
init_op = tf.grouptf.global_variables_initializer), tf.local_variables_initializer))

# 6.train
saver = tf.train.Saver)
save_path = './leNet_mnist.ckpt'

with tf.Session) as sess:
    sess.runinit_op)
    for step in range11000):
        """
        mnist.train.num_examples=55000
        11000*100/mnist.train.num_examples=20epochs
        """
        batch_x, batch_y = mnist.train.next_batch100)
        _, ls = sess.run[optimizer_op, loss],
                         feed_dict={tf_x: batch_x, tf_y: batch_y, is_training: True})
        if step % 100 == 0:
            acc_test = sess.runaccuracy,
                                feed_dict={tf_x: mnist.test.images, tf_y: mnist.test.labels,
                                           is_training: False})
            print'Step: ', step, ' | train loss: {:.4f} | test accuracy: {:.3f}'.formatls, acc_test))
            sess.runtf.local_variables_initializer))  # 不加上这句的话 accuracy 就是个累积平均值了
    saver.savesess, save_path)

# 7.test
with tf.Session) as sess:
    sess.runinit_op)
    saver.restoresess, save_path)
    acc_test = sess.runaccuracy, feed_dict={tf_x: mnist.test.images,
                                             tf_y: mnist.test.labels,
                                             is_training: False})
    print'test accuracy: {}'.formatacc_test)) # test accuracy: 0.991100013256073

View Code

训练得到示例模型 LeNet_mnist.ckpt, 随后为了确定输出节点(Net/ip2/BiasAdd),我们需要到 tensorboard 里去瞅瞅

from tensorflow.summary import FileWriter
sess = tf.Session)
tf.train.import_meta_graph"leNet_mnist.ckpt.meta")
FileWriter"__tb", sess.graph)

View Code

知道了输出节点我们就可以将模型转换成 pb 文件了并计算 FLOPs 了:

# ================================================================ #
#                Convert ckpt to pb & Compute FLOPs                #
# ================================================================ #
from tensorflow.python.framework import graph_util


def stats_graphgraph):
    flops = tf.profiler.profilegraph, options=tf.profiler.ProfileOptionBuilder.float_operation))
    params = tf.profiler.profilegraph, options=tf.profiler.ProfileOptionBuilder.trainable_variables_parameter))
    print'FLOPs: {};    Trainable params: {}'.formatflops.total_float_ops, params.total_parameters))


with tf.Graph).as_default) as graph:
    # 1. Create Graph
    image = tf.Variableinitial_value=tf.random_normal[1, 28, 28, 1]))
    conv1 = tf.layers.conv2dinputs=image, filters=20, kernel_size=5,
                             strides=1, padding='valid', activation=None, name='conv1')  # 20x24x24
    pool1 = tf.layers.max_pooling2dinputs=conv1, pool_size=2, strides=2, name='pool1')  # 20x12x12
    conv2 = tf.layers.conv2dpool1, 50, 5, 1, 'valid', activation=None, name='conv2')  # 50x8x8
    pool2 = tf.layers.max_pooling2dconv2, 2, 2, name='pool2')  # 50x4x4
    pool2_flat = tf.reshapepool2, [-1, 4 * 4 * 50])
    fc1 = tf.layers.densepool2_flat, 500, tf.nn.relu, name='ip1')  # 500
    predict = tf.layers.densefc1, 10, name='ip2')  # 10
    print'stats before freezing')
    stats_graphgraph)

    # 2. Freeze Graph
    with tf.Session) as sess:
        sess.runtf.global_variables_initializer))
        output_graph = graph_util.convert_variables_to_constantssess, graph.as_graph_def), ['ip2/BiasAdd'])
        with tf.gfile.GFile'LeNet_mnist.pb', "wb") as f:
            f.writeoutput_graph.SerializeToString))


def load_pbpb):
    with tf.gfile.GFilepb, "rb") as f:
        graph_def = tf.GraphDef)
        graph_def.ParseFromStringf.read))
    with tf.Graph).as_default) as graph:
        tf.import_graph_defgraph_def, name='')
        return graph


# 3. Load Frozen Graph
graph = load_pb'LeNet_mnist.pb')
print'stats after freezing')
stats_graphgraph)

View Code

stats before freezing
FLOPs: 5478522;    Trainable params: 431864
stats after freezing
FLOPs: 4615950;    Trainable params: 0

 具体的:

node name | # parameters
_TFProfRoot --/431.86k params)
  Variable 1x28x28x1, 784/784 params)
  conv1 --/520 params)
    conv1/bias 20, 20/20 params)
    conv1/kernel 5x5x1x20, 500/500 params)
  conv2 --/25.05k params)
    conv2/bias 50, 50/50 params)
    conv2/kernel 5x5x20x50, 25.00k/25.00k params)
  ip1 --/400.50k params)
    ip1/bias 500, 500/500 params)
    ip1/kernel 800x500, 400.00k/400.00k params)
  ip2 --/5.01k params)
    ip2/bias 10, 10/10 params)
    ip2/kernel 500x10, 5.00k/5.00k params)

node name | # float_ops
_TFProfRoot --/4.62m flops)
  conv2/Conv2D 3.20m/3.20m flops)
  ip1/MatMul 800.00k/800.00k flops)
  conv1/Conv2D 576.00k/576.00k flops)
  conv1/BiasAdd 11.52k/11.52k flops)
  pool1/MaxPool 11.52k/11.52k flops)
  ip2/MatMul 10.00k/10.00k flops)
  conv2/BiasAdd 3.20k/3.20k flops)
  pool2/MaxPool 3.20k/3.20k flops)
  ip1/BiasAdd 500/500 flops)
  ip2/BiasAdd 10/10 flops)

PyTorch

import torch
import torchvision
import torch.nn as nn
import torch.nn.functional as F
import torchvision.transforms as transforms
from torchsummary import summary

# Device configuration
device = torch.device'cpu') #torch.device'cuda: 0' if torch.cuda.is_available) else 'cup')
printdevice, torch.__version__)

# Hyper parameters
num_epochs = 5
num_classes = 10
batch_size = 100
learning_rate = 0.01

# MINST DATASET
train_dataset = torchvision.datasets.MNISTroot='H:/Other_Datasets/',
                                           train=True,
                                           transform=transforms.ToTensor),
                                           download=True)

test_dataset = torchvision.datasets.MNISTroot='H:/Other_Datasets/',
                                          train=False,
                                          transform=transforms.ToTensor))

# Data loader
train_loader = torch.utils.data.DataLoaderdataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)
test_loader = torch.utils.data.DataLoaderdataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)


class LeNetnn.Module):
    def __init__self, in_channels, num_classes):
        superLeNet, self).__init__)
        self.conv1 = nn.Conv2din_channels, 20, kernel_size=5, stride=1)  # 20x24x24
        self.pool1 = nn.MaxPool2dkernel_size=2, stride=2)  # 20x12x12
        self.conv2 = nn.Conv2d20, 50, kernel_size=5, stride=1)  # 50x8x8
        self.pool2 = nn.MaxPool2dkernel_size=2, stride=2)  # 50x4x4
        self.fc1 = nn.Linear50 * 4 * 4, 500)  # 500
        self.fc2 = nn.Linear500, num_classes)  # 10

    def forwardself, input):
        out = self.conv1input)
        out = self.pool1out)
        out = self.conv2out)
        out = self.pool2out)
        out = out.reshapeout.size0), -1)  # pytorch folow NCHW convention
        out = F.reluself.fc1out))
        out = self.fc2out)
        return out


model = LeNet1, num_classes).todevice)

View Code

 可直接使用 torchsummary 模块统计参数量

summarymodel, 1, 28, 28), device=device.type)
"""
----------------------------------------------------------------
        Layer type)               Output Shape         Param #
================================================================
            Conv2d-1           [-1, 20, 24, 24]             520
         MaxPool2d-2           [-1, 20, 12, 12]               0
            Conv2d-3             [-1, 50, 8, 8]          25,050
         MaxPool2d-4             [-1, 50, 4, 4]               0
            Linear-5                  [-1, 500]         400,500
            Linear-6                   [-1, 10]           5,010
================================================================
Total params: 431,080
Trainable params: 431,080
Non-trainable params: 0
----------------------------------------------------------------
Input size MB): 0.00
Forward/backward pass size MB): 0.14
Params size MB): 1.64
Estimated Total Size MB): 1.79
----------------------------------------------------------------
"""

也可用变体版本 torchscan

from torchscan  import summary
summarymodel, 1, 28, 28))
"""
__________________________________________________________________________________________
Layer                        Type                  Output Shape              Param #        
==========================================================================================
lenet                        LeNet                 -1, 10)                  0              
├─conv1                      Conv2d                -1, 20, 24, 24)          520            
├─pool1                      MaxPool2d             -1, 20, 12, 12)          0              
├─conv2                      Conv2d                -1, 50, 8, 8)            25,050         
├─pool2                      MaxPool2d             -1, 50, 4, 4)            0              
├─fc1                        Linear                -1, 500)                 400,500        
├─fc2                        Linear                -1, 10)                  5,010          
==========================================================================================
Trainable params: 431,080
Non-trainable params: 0
Total params: 431,080
------------------------------------------------------------------------------------------
Model size params + buffers): 1.64 Mb
Framework & CUDA overhead: 0.00 Mb
Total RAM usage: 1.64 Mb
------------------------------------------------------------------------------------------
Floating Point Operations on forward: 4.59 MFLOPs
Multiply-Accumulations on forward: 2.30 MMACs
Direct memory accesses on forward: 2.35 MDMAs
__________________________________________________________________________________________
"""

1. 使用开源工具 pytorch-OpCounter 推荐)

from thop import profile
input = torch.randn1, 1, 28, 28)
macs, params = profilemodel, inputs=input, ))
print'Total macc:{}, Total params: {}'.formatmacs, params))
"""
Total macc:2307720.0, Total params: 431080.0
"""

 2. 使用开源工具 torchstat

from torchstat import stat
statmodel, 1, 28, 28))
"""
      module name  input shape output shape    params memoryMB)         MAdd        Flops  MemReadB)  MemWriteB) duration[%]  MemR+WB)
0           conv1    1  28  28   20  24  24     520.0       0.04    576,000.0    299,520.0      5216.0      46080.0      99.99%    51296.0
1           pool1   20  24  24   20  12  12       0.0       0.01      8,640.0     11,520.0     46080.0      11520.0       0.00%    57600.0
2           conv2   20  12  12   50   8   8   25050.0       0.01  3,200,000.0  1,603,200.0    111720.0      12800.0       0.00%   124520.0
3           pool2   50   8   8   50   4   4       0.0       0.00      2,400.0      3,200.0     12800.0       3200.0       0.00%    16000.0
4             fc1          800          500  400500.0       0.00    799,500.0    400,000.0   1605200.0       2000.0       0.00%  1607200.0
5             fc2          500           10    5010.0       0.00      9,990.0      5,000.0     22040.0         40.0       0.00%    22080.0
total                                        431080.0       0.07  4,596,530.0  2,322,440.0     22040.0         40.0      99.99%  1878696.0
==========================================================================================================================================
Total params: 431,080
------------------------------------------------------------------------------------------------------------------------------------------
Total memory: 0.07MB
Total MAdd: 4.6MMAdd
Total Flops: 2.32MFlops
Total MemR+W: 1.79MB
"""

不过,貌似这里和论文中的计算方式不一样,感觉上 conv = macc/2 + bias_op, fc = macc, pool 对于 caffe 的 comp

ps: 网上有评论说 MAdd 和 Flops 应该对调!

Published by

风君子

独自遨游何稽首 揭天掀地慰生平