机器学习--提升方法(二)

机器学习 | 提升方法(二)

上周我们说明了什么是AdaBoost算法以及它的数学模型,这周我们接着来说明AdaBoost的相关内容。

接着上周的内容,我们先来举一个例子:

给定如下表所示的训练数据。假设弱分类器由\(x<v\)\(x>v\)产生,其阈值\(v\)使该分类器在训练数据集上分类误差率最低。试用AdaBoost算法学习一个强分类器。

序号 1 2 3 4 5 6 7 8 9 10
\(x\) 0 1 2 3 4 5 6 7 8 9
\(y\) 1 1 1 -1 -1 -1 1 1 1 -1

解:初始化数据权值分布: $$ D_1=(w_{11},w_{12},...,w_{110}) \

\[ \] w_{1i} = 0.1, i = 1, 2,...,10 $$

对于m = 1,

(a)在权值分布为\(D_1\)的训练数据上,阈值\(v\)取2.5时分类误差率最低,故基本分类器为: \[ G_1(x) = x<2.5?1:-1, (x!=0) \] (b)\(G_1(x)\)在训练数据集上的误差率\(e_1 = P(G_1(x_i)\ne y_i) = 0.3\)

(c)计算\(G_1(x)\)的系数:\(\alpha_i = \frac{1}{2}log\frac{1-e_1}{e_1}=0.4236\)

(d)更新训练数据的权值分布: $$ D_2=(w_{21}, ...,w_{2i}, ...,w_{210})\ \ \

\[ \] w_{2i} = exp(-_1y_iG_1(x_i)), i =1,2,...,10 $$

\[ D_2=(0.07143,0.07143,0.07143,0.07143,0.07143,0.07143,0.16667,0.16667,0.16667,0.07143) \]

\[ f_1(x)=0.4236G_1(x) \]

分类器\(sign[f_1(x)]\)在训练数据集上有3个误分类点。

对m = 2,

(a)在权值分布为\(D_2\)的训练数据上,阈值\(v\)取8.5时分类误差率最低,故基本分类器为: \[ G_2(x) = x<8.5?1:-1, (x!=0) \] (b)\(G_2(x)\)在训练数据集上的误差率\(e_2 = 0.2143\)

(c)计算\(\alpha_2=0.6496\)

(d)更新训练数据权值分布: $$ D_3=(0.0455,0.0455,0.0455,0.1667,0.1667,0.1667,0.1060,0.1060,0.1060,0.0455)\

\[ \] f_2(x)=0.4236G_1(x)+0.6496G_2(x) $$

分类器\(sign[f_2(x)]\)在训练数据集上有3个误分类点。

对m = 3,

(a)在权值分布为\(D_3\)的训练数据上,阈值\(v\)取5.5时分类误差率最低,故基本分类器为: \[ G_3(x) = x>5.5?1:-1, (x!=0) \] (b)\(G_3(x)\)在训练数据集上的误差率\(e_3 = 0.1820\)

(c)计算\(\alpha_3=0.7514\)

(d)更新训练数据权值分布: $$ D_4=(0.125,0.125,0.125,0.102,0.102,0.102,0.065,0.065,0.065,0.125)\

\[ \] f_3(x)=0.4236G_1(x)+0.6496G_2(x)+0.7514G_3(x) $$

分类器\(sign[f_3(x)]\)在训练数据集上误分类点个数为0。

于是最终分类器为: \[ G(x) = sign[f_3(x)] = sign[0.4236G_1(x)+0.6496G_2(x)+0.7514G_3(x)] \]

AdaBoost算法的训练误差分析

AdaBoost最基本的性质是它能在学习过程中不断减少训练误差,即在训练数据集上的分类误差率。关于这个问题有下面的定理:

定理(AdaBoost的训练误差界)AdaBoost算法最终分类器的训练误差界为: \[ \frac{1}{N}\sum^N_{i=1}I(G(x_i)\ne y_i)\le\frac{1}{N}\sum_iexp(-y_if(x_i)) = \prod_mZ_m \] 这里,\(G(x),f(x)\)\(Z_m\)分别在上周的文章中给出。

定理(二类分类问题AdaBoost的训练误差界) \[ \prod^M_{m=1}Z_m=\prod^M_{m=1}[2\sqrt{e_m(1-e_m)}] = \prod^M_{m=1}\sqrt{(1-4\gamma_m^2)}\le exp(-2\sum^M_{m=1}\gamma_m^2) \] 这里,\(\gamma_m = \frac{1}{2}-e_m\)

推论:如果存在\(\gamma>0\),对所有m有\(\gamma_m\ge \gamma\),则 \[ \frac{1}{N}\sum^N_{i=1}I(G(x_i)\ne y_i)\le exp(-2M\gamma^2) \] 这表明在此条件下AdaBoost的训练误差是以指数速率下降的。

以上就是今天的内容,下周我们开始对提升树的学习,谢谢大家的观看!