A short story about fine tuning

or should freezing some part of the network speed up training process or not?

A month ago I participated in Kaggle’s Bengali competition (and was lucky to be in top 15%!). As a starting point I took pretrained resnet18 (BTW I used Pytorch), removed it’s head (as pretrained resnet18 is for 1000 class classification problem) and added a head corresponding to this competition problem.

So I had pretrained body (>>10 layers, cut a little bit) + somehow initialized untrained head (< 10 layers). There is a practice to train head first to don’t disturb body’s weights by back prop. Yes, body’s weights should be updated for this particular task but let’s do it later when head somehow trained.

And one more benefit I expected to get by this was speeding up my training process. However, I got no speed up at all.

Why?