Logistic回归的梯度下降法推导

Marchine Learning in Action 一书第五章介绍了逻辑回归(Logistic regression)，但是书中没有给出目标函数，也没有给出梯度下降法的推导。在解释代码处文中指出：

A little math is needed to derive the equations used here, and I’ll leave you to look into that further if desired.

So, 本文的目的就是展示所谓的A little math。

Logistic regression

sigmoid function

$$
\sigma(z)=\frac{1}{1+e^{-z}}
$$

结合线性分类器，分类方法写为：

\begin{equation}
h_{\theta}(x)=\frac{1}{1+e^{-\theta^Tx}}
\end{equation}

其中，$\theta$和$x$都是矢量，如果$h_{\theta}(x)>0.5$则$y=1$, 否则$y=0$。根据$y$的取值，样本$x$就被分成两类。
下面的问题是如何求出最优的$\theta$？

目标函数

一般而言目标函数会写成所有训练样本的误差项求和的形式
$$\begin{equation} J(\theta)=\frac{1}{N}\sum_{i=1}^N \frac{1}{2}[y^{(i)}-h_{\theta}(x^{(i)})]^2 \end{equation}$$

但是对于Logstic回归来说，这个目标函数并不好，相对于自变量是一个非凸函数，
因此使用另外的目标函数，这个目标函数是

$$\begin{equation} J^{(i)}(\theta)= \begin{cases} -\log(1-h_{\theta}(x^{(i)})) & \text{if } y^{(i)}=0\\ -\log(h_{\theta}(x^{(i)}) &\text{if } y^{(i)}=1 \end{cases} \end{equation}$$

定义如下函数
$$\begin{equation} \text{Cost}(h_{\theta}(x),y)= -y\log(h_{\theta}(x))-(1-y)\log(1-h_{\theta}(x)) \end{equation}$$
并将目标函数写为
$$\begin{equation} J^{(i)}(\theta)= \frac{1}{N}\sum_{i=1}^N\text{Cost}(h_{\theta}(x^{(i)}),y^{(i)})= -\frac{1}{N}[y^{(i)}\log(h_{\theta}(x^{(i)}))+(1-y^{(i)})\log(1-h_{\theta}(x^{(i)}))] \end{equation}$$

梯度下降法求解$\frac{\partial}{\partial\theta_j}J(\theta)$

括号[]内第一求和项的导数

$$\begin{align} \frac{\partial y^{(i)}\log(h_{\theta}(x^{(i)})}{\partial \theta_j} =&y^{(i)}(1+e^{-\theta^T x})\frac{e^{-\theta^T x}x^{(j)}}{(1+e^{-\theta^T x})^2}\\ =&y^{(i)}\frac{e^{-\theta^T x}x^{(j)}}{1+e^{-\theta^T x}}\\ \end{align}$$

括号[]内第二求和项的导数

$$\begin{align} \frac{\partial (1-y^{(i)})\log(1-h_{\theta}(x^{(i)})}{\partial \theta_j} =&(1-y^{(i)}) \frac{(1+e^{-\theta^T x})}{e^{-\theta^T x}} \times \frac{-e^{-\theta^T x}}{(1+e^{-\theta^T x})^2}x^{(j)}\\ =&(y^{(i)}-1)\frac{x^{(j)}}{1+e^{-\theta^T x}}\\ \end{align}$$

以上两式求和得
$$\begin{equation} (y^{(i)}-h_{\theta}(x^{(i)}))x^{(j)} \end{equation}$$
因此
$$\begin{equation} \frac{\partial}{\partial\theta_j}J(\theta)=\frac{1}{N}(h_{\theta}(x^{(i)})-y^{(i)})x^{(j)} \end{equation}$$