首页 > 常用平台 > 正文

fastai

标签:fast.ai, fastai


目录

参考称霸Kaggle的十大深度学习技巧

原文:https://blog.floydhub.com/ten-techniques-from-fast-ai/

安装直接(目前这个版本要求torch<0.4,而autokeras要求>=0.4.0,会有点小diff咯)

pip install fastai

使用多个而不是单一学习率

差分学习率(Differential Learning rates)意味着在训练时变换网络层比提高网络深度更重要。

参考https://github.com/fastai/fastai/blob/master/courses/dl1/lesson1-vgg.ipynb

例如:

from fastai.imports import *

from fastai.transforms import *
from fastai.conv_learner import *
from fastai.model import *
from fastai.dataset import *
from fastai.sgdr import *
from fastai.plots import *

# import library for creating learning object for convolutional #networks

sz=224
arch=vgg16

# assign model to resnet, vgg, or even your own custom model
PATH = './imgs' ##文件夹要是imgs/train/1/xx.jpg, images/valid/1/xx.jpg
data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz))

# create fast ai data object, in this method we use from_paths where 
# inside PATH each image class is separated into different folders

learn = ConvLearner.pretrained(arch, data, precompute=True)

# create a learn object to quickly utilise state of the art
# techniques from the fast ai library

然后,冻结前面网络层并微调后面网络层:

learn.freeze()

# freeze layers up to the last one, so weights will not be updated.

learning_rate = 0.1
learn.fit(learning_rate, epochs=3)

# train only the last layer for a few epochs

当后面的网络效果比较好的时候,可以用差分学习率来改变前面的网络层,实践中,一般将学习率的缩小倍数设置为10倍:

learn.unfreeze()

# set requires_grads to be True for all layers, so they can be updated

learning_rate = [0.001, 0.01, 0.1]
# learning rate is set so that deepest third of layers have a rate of 0.001, # middle layers have a rate of 0.01, and final layers 0.1.

learn.fit(learning_rate, epochs=3)
# train model for three epoch with using differential learning rates

如何找到合适的学习率

一篇周期性学习率的paper:Cyclical Learning Rates for Training Neural Networks

用较低的学习率来训练,但在每个batch中以指数形式增加:

learn.lr_find()
# run on learn object where learning rate is increased  exponentially

learn.sched.plot_lr()
# plot graph of learning rate against iterationslr


然后可以看看学习率和loss的关系

learn.sched.plot()
# plots the loss against the learning rate


通过找出学习率最高Loss值仍在下降的值来确定最佳学习率。在上述情况中,该值将为0.01。

cosine annealing

当逐渐接近loss最小值时,学习率应该变得更小来使得模型不会超调且尽可能接近这一点。余弦退火(Cosine annealing)利用余弦函数来降低学习率。



从上图可以看出,随着x的增加,余弦值首先缓慢下降,然后加速下降,再次缓慢下降

learn.fit(0.1, 1)
# Calling learn fit automatically takes advantage of cosine annealing

Fast.ai库中的learn.fit()函数,来快速实现这个算法,在整个周期中不断降低学习率,如下图所示:



带重启的SGD算法

梯度下降算法可能陷入局部最小值,而不是全局最小值。



可以通过突然提高学习率,来跳出局部最小值并找到通向全局最小值的路径。这种方式称为带重启的随机梯度下降方法(stochastic gradient descent with restarts, SGDR),这个方法在Loshchilov和Hutter的ICLR论文中展示出了很好的效果。SGDR: Stochastic Gradient Descent with Warm Restarts

当调用learn.fit(learning_rate, epochs)函数时,学习率在每个周期开始时重置为参数输入时的初始值,然后像上面余弦退火部分描述的那样,逐渐减小。



每当学习率下降到最小点,在上图中为每100次迭代,我们称为一个循环。

cycle_len = 1
# decide how many epochs it takes for the learning rate to fall to
# its minimum point. In this case, 1 epoch

cycle_mult=2
# at the end of each cycle, multiply the cycle_len value by 2

learn.fit(0.1, 3, cycle_len=2, cycle_mult=2)
# in this case there will be three restarts. The first time with
# cycle_len of 1, so it will take 1 epoch to complete the cycle.
# cycle_mult=2 so the next cycle with have a length of two epochs, 
# and the next four.

如果我们把cycle_mult设成2:



关于这两个cycle_mult和cycle_len的函数,可以参考http://forums.fast.ai/t/understanding-cycle-len-and-cycle-mult/9413/8



上篇: 深入浅出强化学习-chap13 值迭代网络
下篇: perl

comment here..