HOME/Articles/

西瓜书

Article Outline

<script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script>

<script type="text/x-mathjax-config"> MathJax.Hub.Config({ showProcessingMessages: false, tex2jax: { skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'], inlineMath: [['$','$']] } }); </script>

image

<center>Photo by <a style="background-color:black;color:white;text-decoration:none;padding:4px 6px;font-family:-apple-system, BlinkMacSystemFont, &quot;San Francisco&quot;, &quot;Helvetica Neue&quot;, Helvetica, Ubuntu, Roboto, Noto, &quot;Segoe UI&quot;, Arial, sans-serif;font-size:12px;font-weight:bold;line-height:1.2;display:inline-block;border-radius:3px" href="https://unsplash.com/photos/aFUHu9WNO3Q" target="_blank" rel="noopener noreferrer" title="Download free do whatever you want high-resolution photos from Floh Maier "><span style="display:inline-block;padding:2px 3px"><svg xmlns="http://www.w3.org/2000/svg" style="height:12px;width:auto;position:relative;vertical-align:middle;top:-2px;fill:white" viewBox="0 0 32 32"><title>unsplash-logo</title><path d="M10 9V0h12v9H10zm12 5h10v18H0V14h10v9h12v-9z"></path></svg></span><span style="display:inline-block;padding:2px 3px">Floh Maier</span></a></center>

logistic回归

  1. logistic回归解决的是什么问题?
  2. 它的表达式是什么?
  3. 求导公式是什么?

logistic回归解决的是分类问题

表达式:$y=\sigma(f(\boldsymbol{x}))=\sigma\left(\boldsymbol{w}^{T} \boldsymbol{x}\right)=\frac{1}{1+e^{-\boldsymbol{w}^{T} \boldsymbol{x}}}$

推导公式:

$$\begin{aligned} \frac{\partial}{\partial x} \sigma(f(x)) &=\frac{\partial}{\partial x}\left(\frac{1}{1+e^{-f(x)}}\right) \ &=\frac{-e^{-f(x)}}{\left[1+e^{-f(x)}\right]^{2}} \cdot\left(-\frac{\partial}{\partial x} f(x)\right) \ &=\frac{1}{1+e^{-f(x)}} \cdot \frac{e^{-f(x)}}{1+e^{-f(x)}} \cdot \frac{\partial}{\partial x} f(x) \ &=\sigma(f(x)) \cdot\left(1-\sigma(f(x))\right) \cdot \frac{\partial}{\partial x} f(x) \end{aligned}$$

决策树

  • 常见决策树算法有哪些?
  • 它们的划分准则分别是什么?
  • 是否有缺陷?
  • 决策树为什么要剪枝?
  • 剪枝有几种方法?优缺点?

常见:*ID3、C4.5、CART***

划分准则:

  • *信息增益 ---* $\operatorname{Gain}(D, a)=\operatorname{Ent}(D)-\sum_{v=1}^{V} \frac{\left|D^{v}\right|}{|D|} \operatorname{Ent}\left(D^{v}\right)$
  • *信息增益率 ---* $\operatorname{Gain}_{\text {ratio}}(D, a)=\frac{\operatorname{Gain}(D, a)}{\operatorname{IV}(a)}$
  • *基尼指数 ---* $\operatorname{Gini}$_ $\operatorname{index}(D,a) = \sum_{v=1}^{V}\frac{|D^v|}{|D|}\operatorname{Gini}(D^v)$

缺陷:

  • 信息增益:对可取值较多的属性有偏好。
  • 信息增益率:对可取值较少的属性有偏好。

  • 信息熵 ~ $\operatorname{Ent}(D)=-\sum_{k=1}^{|\mathcal{Y}|}p_k\log_{2}{p_k}$
  • 增益率 ~ $\mathrm{IV}(a)=-\sum_{v=1}^{V} \frac{\left|D^{v}\right|}{|D|} \log _{2} \frac{\left|D^{v}\right|}{|D|}$
  • 基尼值 ~ $\operatorname{Gini}(D) =\sum_{k=1}^{|\mathcal{Y}|} \sum_{k^{\prime} \neq k} p_{k} p_{k^{\prime}}=1-\sum_{k=1}^{|\mathcal{Y}|} p_{k}^{2}$

<a href="https://datawhalechina.github.io/pumpkin-book/#/chapter4/chapter4" class="button">Details</a>