# Useful Inequalities

## 1 $$e^x$$ and $$\ln (1+x)$$ #

$e^x \geq x+1$

$\ln (1+x)\leq x$

## 2 Triangle Inequality #

### General statement #

In a normed vector space $$V$$:

$\|x+y\|\leq \|x\|+\|y\|$

This is actually a defining property of the norm.

Generalization: Minkowski Inequality.

### Absolute value as norm #

$|x+y|\leq |x| + |y|$

$|x-y|\leq |x| + |y|$

### Reverse triangle inequality #

$\bigg| |x|-|y| \bigg|\le|x-y|$

## 3 Cauchy–Schwarz Inequality #

### General statement #

For all vectors $$\vec{u}$$ and $$\vec{v}$$ of an inner product space:

$\left|\langle \vec{u},\vec{v}\rangle \right|^2 \leq \langle \vec{u},\vec{u}\rangle \cdot \langle \vec{v},\vec{v}\rangle$

or written in a more familiar form:

$\left|\langle \vec{u},\vec{v}\rangle \right| \leq \| \vec{u}\| \| \vec{v}\|$

The two sides are equal if and only if $$\vec{u}$$ and $$\vec{v}$$ are linearly dependent.

Generalization: Hölder’s Inequality

### In $$\RR^2$$ #

If $$\vec{v}=(v_1, v_2)$$ and $$\vec{u}=(u_1,u_2)$$:

$(u_1v_1+u_2v_2)^2 \leq (u_1^2+u_2^2)(v_1^2+v_2^2)$

### In $$\RR ^n$$ #

$\left( \sum_{i=1}^n u_i v_i \right)^2 \leq \left( \sum_{i=1}^n u_i^2 \right)\left( \sum_{i=1}^n v_i^2 \right)$

Sometimes, there is a non-negative constant $$c_i\geq 0$$:

$\left( \sum_{i=1}^n c_i u_i v_i \right)^2 \leq \left( \sum_{i=1}^n c_iu_i^2 \right)\left( \sum_{i=1}^n c_iv_i^2 \right)$

### Expectation version #

$\E[|XY|]\leq \sqrt{\E \left[ X^2 \right] \cdot \E \left[ Y^2 \right]}$

## 4 Markov Inequality #

For a random variable $$X$$, the more information we know about it, the better bounds we can achieve for $$\Pr[X\geq t]$$. Suppose we only know $$\E[X]$$:

If $$X\geq 0$$ and $$t>0$$, then:

$\Pr[X\geq t]\leq \frac{\E[X]}{t}$

The intuition is easy: if $$X$$ is smaller than $$t$$, just assume it is 0. If $$X$$ is bigger than $$t$$, just assume it is $$t$$, so we can give the proof sketch:

$\E[X] = \Pr[X<t]\cdot\E[X|X<t]+\Pr[X\geq t]\cdot\E[X|X\geq t]$

And $$\E[X|X<t]>0$$ because $$X\geq 0$$, $$\E[X|X\geq t]>t$$ because the conditional expectation only takes into account of values larger than $$t$$. Hence,

$\E[X]\geq \Pr[X\geq t]\cdot\E[X|X\geq t]\geq t\cdot \Pr[X\geq t]$

Proof

$$a \1_{X\geq a} \leq X$$ holds with probability 1. Take expectation on both sides.

## 5 Chebyshev Inequality #

Suppose we now know both the expectation $$\E[X]$$ and the variance $$\Var[X]$$:

Let $$\E[X]=\mu$$ and $$\Var[X]=\sigma^2\neq 0$$, then for all $$t>0$$:

$\Pr[|X-\mu|\geq t]\leq \frac{\sigma^2}{t^2}$

The intuition is similar with Markov Inequality: if $$|X-\mu|$$ is smaller than $$t\sigma$$, just assume it is 0. If $$|X-\mu|$$ is bigger than $$t\sigma$$, just assume it is $$t\sigma$$. The proof sketch:

\begin{aligned}\sigma^2 &= \E[(X-\mu)^2]\\ &= \E \left[ (X-\mu)^2 \mid |X-\mu|\geq t\sigma \right] \Pr \left[ |X-\mu|\geq t\sigma \right] + \E \left[ (X-\mu)^2 \mid |X-\mu| < t\sigma \right] \Pr \left[ |X-\mu| < t\sigma \right]\\ &\geq (t\sigma)^2\Pr \left[ |X-\mu|\geq t\sigma \right] + 0 \cdot \Pr \left[ |X-\mu| < t\sigma \right] \\ &= t^2\sigma^2\Pr \left[ |X-\mu| < t\sigma \right] \end{aligned}

Chebyshev inequality follows by dividing by $$t^2\sigma^2$$.

## 6 Chernoff Inequality #

Even though Markov Inequality and Chebyshev Inequality only use information about the expectation and the variance of the random variable under consideration, they are essentially tight for a general random variable. i.e., can construct non-trivial (non-constant) random variables for which these two inequalities are tight (hold with equality).

To get a better bound, we need more information, so let’s consider the sum of a number of independent random variables.

Suppose $$X=X_1+X_2+\dots +X_n$$ and $$X_i$$ are independent, $$\mu = \E[X] = \sum_i \E[X_i]$$.

### General Bound #

Applying Markov Inequality to $$e^{tX}$$ for every $$t>0$$:

$\Pr[X\geq a]=\Pr \left[ e^{tX}\geq e^{ta} \right] \leq \frac{\E \left[ e^{tX} \right]}{e^{ta}} = e^{-ta}\E \left[ \prod_{i=1}^n e^{tX_i} \right]$

Optimizing over $$t$$ and since $$X_i$$ are independent, we have:

$\Pr[X\geq a]\leq \min_{t>0}e^{-ta}\prod_{i=1}^n \E \left[ e^{tX_i} \right]$

Similarly, we have:

$\Pr[X\leq a]\leq \min_{t>0}e^{ta}\prod_{i=1}^n \E \left[ e^{-tX_i} \right]$

Specific Chernoff bounds are attained by calculating $$\E \left[ e^{-tX_i} \right]$$ for specific instances of $$X_i$$.

### Bernoulli version of Chernoff Bound #

Let $$X_i=1$$ with probability $$p_i$$ and $$X_i=0$$ with probability $$1-p_i$$. $$\mu = \E[X] = \sum_{i=1}^n p_i$$, then

• Upper Tail: for all $$\delta>0$$, $\Pr[X\geq(1+\delta)\mu] \leq e^{-\frac{\delta^2}{2+\delta}\mu}$

• Lower Tail: for all $$0<\delta<1$$, $\Pr[X\leq(1-\delta)\mu] \leq e^{-\frac{\delta^2}{2}\mu}$

For $$0<\delta<1$$, we can also combine the lower and upper tails:

$\Pr \left[ |X-\mu|\geq \delta\mu \right] \leq 2e^{-\frac{\delta^2}{3}\mu}$

### Moment Generation Version #

Let $$X$$ be a random variable with $$M(t)=\E[e^{tX}]$$, then for any $$a>0$$,

$\Pr[X\geq a]\leq e^{-at} M(t), t >0$

$\Pr[X\leq -a]\leq e^{-at} M(t), t <0$

Proof

For $$t>0$$,

$\Pr[X\geq a] = \Pr[e^{tX} \geq e^{ta}] \leq \frac{\E[e^{tX}]}{e^{ta}} = e^{-at} M(t)$

## 7 Hoeffding Inequality #

Hoeffding’s inequality is a generalization of the Chernoff Bound, sometimes called Chernoff-Hoeffding bound.

Suppose $$X_1, \dots, X_n$$ are independent and $$a_i\leq X_i\leq b_i$$, $$\mu=\E[X]$$, then for all $$t>0$$,

$\Pr[X\geq \mu +t]\leq \exp \left( -\frac{2t^2}{\sum_{i=1}^n (b_i-a_i)^2} \right)$

Usually we consider the following special case. Let $$X_1, \dots, X_n$$ be random variables with common support $$[0, 1]$$ and $$\E[X_i]=\mu$$. Let $$S_n = X_1 + \dots + X_n$$. Then for all $$t\geq0$$,

$\Pr\left[S_n\geq n \mu+t\right] \leq e^{-\frac{2t^2}{n}}$

and

$\Pr\left[S_n\leq n \mu-t\right] \leq e^{-\frac{2t^2}{n}}$

So

$\Pr\left[|S_n-n\mu|<t\right]\geq 1-2 e^{-\frac{2t^2}{n}}$

Note

Equivalently, we can write:

$\Pr\left[\overline{X}_n\geq \mu+ \varepsilon\right] \leq e^{-2n\varepsilon^2}$

and

$\Pr\left[\overline{X}_n \leq \mu- \varepsilon\right] \leq e^{-2n\varepsilon^2}$

So

$\Pr\left[|\overline{X}_n-\mu|<\epsilon\right]\geq 1-2 e^{-2n\varepsilon^2}$

This is generalized by Azuma’s Inequality.

## 8 Hölder’s Inequality #

Hölder’s inequality is a generalization of the Cauchy–Schwarz Inequality.

Let $$\frac{1}{p}+\frac{1}{q}=1$$ with $$p,q>1$$ (not necessarily integers), note that when $$p=q=2$$, this is equivalent to Cauchy–Schwarz Inequality.

### General version #

$\|fg\|_1 \leq \|f\|_p \|g\|_q$

where

$\|x\|_p = \left( \sum_{k=1}^n |x_k|^p \right)^{\frac{1}{p}}$

### Integral version #

$\int_a^b |f(x)g(x)| dx \leq \left[ \int_a^b |f(x)|^p dx \right]^{\frac{1}{p}} \left[ \int_a^b |g(x)|^q dx \right]^{\frac{1}{q}}$

with equality when $$|g(x)| = c|f(x)|^{p-1}$$.

### Sum version #

$\sum_{k=1}^n |x_k y_k| \leq \left( \sum_{k=1}^n |x_k|^p \right)^{\frac{1}{p}} \left( \sum_{k=1}^n |y_k|^q \right)^{\frac{1}{q}}$

with equality when $$|y_k| = c|x_k|^{p-1}$$.

### Expectation version #

$\E[|XY|]\leq \left(\E[|X|^p] \right)^{\frac{1}{p}} \cdot \left(\E[|Y|^q] \right)^{\frac{1}{q}}$

Proof

Denote $$\|X\|_p = \left( \E[|X|^p] \right)^{\frac{1}{p}}$$, and $$\|Y\|_q = \left( \E[|Y|^q] \right)^{\frac{1}{q}}$$.

Assume $$0< \|X\|_p, \|Y\|_q < \infty$$, without loss of generality, we can only consider $$\|X\|_p = \|Y\|_q=1$$, otherwise, scale $$X,Y$$ to $$\frac{X}{\|X\|_p}$$, $$\frac{Y}{\|Y\|_q}$$.

Claim that

$xy\leq \frac{x^p}{p} + \frac{y^q}{q}, \forall x,y \geq 0$

So we have

$\E[|XY|] \leq \frac{\E \left[ |X|^p \right]}{p}+\frac{\E \left[ |Y|^q \right]}{q} = \frac{1}{p} + \frac{1}{q} = 1$

Proof of claim:

$\begin{split} &xy\leq \frac{x^p}{p} + \frac{y^q}{q}\\ \Longleftrightarrow& \left( x^p \right)^{\frac{1}{p}} \cdot \left( y^q \right)^{\frac{1}{q}} \leq \frac{x^p}{p} + \frac{y^q}{q}\\ \Longleftrightarrow& \frac{1}{p} \log(x^p) + \frac{1}{q} \log(y^q) \leq \log \left( \frac{x^p}{p} + \frac{y^q}{q} \right) \end{split}$ This follows from Jensen’s Inequality, since $$\log$$ is concave.

(We can replace numbers with R.V.s in the claimed inequality, because it holds for all $$\omega \in \Omega$$.)

## 9 Jensen’s Inequality #

Jensen’s inequality relates the value of a convex function of an integral to the integral of the convex function.

### Finite version #

Let $$f$$ be a convex function on interval $$I$$, then

$f(\lambda_1x_1+\dots + \lambda_n x_n)\leq \lambda_1f(x_1)+\dots +\lambda_n f(x_n)$

where $$x_1,x_2, \dots ,x_n \in I$$, $$\lambda_1, \dots ,\lambda_n \geq 0$$, and $$\lambda_1 +\dots +\lambda_n = 1$$.

### Expectation version #

Let $$f$$ be a convex function, and $$\E[f(X)]$$ and $$f(\E[X])$$ are finite, then

$\E[f(X)]\geq f(\E[X])$

If $$f$$ is concave, then $$\E[f(X)]\leq f(\E[X])$$.

Example

$\E\left[|X|\right]\geq |\E[X]|$ $\E[X^2] \geq \left( E[X] \right)^2$ $\E[\log X] \leq \log(E[X])$ $\E[e^X]\geq e^{\E[X]}$

Proof

Let $$c=\E[X]$$. Find $$a,b$$ such that $$a+bc=f( c)$$ and $$a+bx\leq f(x)$$.

This is always doable, because convex function $$f$$ always lies above a line that passes through point $$(x,f(x))$$. Then we have:

$$\E[f(X)] \geq \E[a+bX]=a+bc=f(\E[X])$$

Proof

Assume $$f$$ is differentiable, denote $$\mu=\E[X]$$, $$f(X) \geq f(\mu) + f’(\mu)(X-\mu)$$.

$\E[f(X)]\geq f(\mu)+0$

## 10 Minkowski Inequality #

Minkowski inequality is a generalization of Triangle Inequality.

### General version #

For $$1\leq p<\infty$$, we have:

$\|f+g\|_p \leq \|f\|_p + \|g\|_p$

### Sum version #

$\left( \sum_{k=1}^n |x_k+y_k|^p \right)^{\frac{1}{p}} \leq \left( \sum_{k=1}^n |x_k|^p \right)^{\frac{1}{p}} + \left( \sum_{k=1}^n |y_k|^p \right)^{\frac{1}{p}}$

### Expectation version #

$\left( \E \left[ |X+Y|^p \right] \right)^{\frac{1}{p}} \leq \left( \E \left[ |X|^p \right] \right)^{\frac{1}{p}} + \left( \E \left[ |Y|^p \right] \right)^{\frac{1}{p}}$

Proof

Let $$q$$ be such that $$\frac{1}{p}+\frac{1}{q}=1$$. We have

$\begin{split} \left(\E \left[ |X+Y|^p \right]\right)^{\frac{1}{p}} &= \left( \E[|X+Y| |X+Y|^{p-1}] \right)^{\frac{1}{p}}\\ &\leq \left( \E[|X| |X+Y|^{p-1}] + \E[|Y| |X+Y|^{p-1}] \right)^{\frac{1}{p}}\\ &\stackrel{\text{Hölder}}{\leq} \left[ (\E[|X|^p])^{\frac{1}{p}} (\E[|X+Y|^{(p-1)q}])^{\frac{1}{q}} + (\E[|Y|^p])^{\frac{1}{p}} (\E[|X+Y|^{(p-1)q}])^{\frac{1}{q}}\right]^{\frac{1}{p}}\\ &= \left( \|X\|_p + \|Y\|_p \right)^{\frac{1}{p}} \left( \E[|X+Y|^p] \right)^{\frac{1}{pq}} \end{split}$

Then solve this recursive inequality and get the result.

## 11 Union Bound #

For a countable set of events $$A_1, A_2, A_3, \dots$$, we have:

$\Pr \left[ \bigcup_i A_i \right] \leq \sum_i \Pr[A_i]$

## 12 Bernoulli’s Inequality #

Approximate exponentiations of $$1+x$$:

For every integer $$n \geq 0$$ and real number $$x\geq -1$$：

$(1+x)^n \geq 1+nx$

If $$n\geq 0$$ and is even, then this inequality holds for all $$x \in \RR$$.

## 13 Inequality of arithmetic and geometric means #

### General version #

For $$x_1,x_2, \dots ,x_n \in \NN$$, “调几算方”：调和平均数 ≤ 几何平均数 ≤ 算术平均数 ≤ 平方平均数（方均根）

$H_n\leq G_n\leq A_n\leq Q_n$

with equality holds if and only if $$x_1=x_2=\dots =x_n$$.

where:

$H_n=\frac{n}{\sum_{i=1}^n \frac{1}{x_i}} = \frac{n}{\frac{1}{x_1}+\frac{1}{x_2}+\dots +\frac{1}{x_n}}$

$G_n = \sqrt[n]{\prod_{i=1}^n x_i} = \sqrt[n]{x_1x_2 \dots x_3}$

$A_n = \frac{\sum_{i=1}^n x_i}{n} = \frac{x_1+x_2+\dots +x_n}{n}$

$Q_n = \sqrt{\frac{\sum_{i=1}^n x_i^2}{n}} = \sqrt{\frac{x_1^2+x_2^2+\dots +x_n^2}{n}}$

### When $$n=2$$ #

$\frac{2}{\frac{1}{a}+\frac{1}{b}} \leq \sqrt{ab}\leq \frac{a+b}{2}\leq \sqrt{\frac{a^2+b^2}{2}}$

### Usage #

1. $$a+b \Leftrightarrow ab$$

$a+b\geq 2\sqrt{ab}$

$ab\leq \left( \frac{a+b}{2} \right)^2$

2. $$a^2+b^2 \Leftrightarrow ab$$

$a^2+b^2\geq 2ab$

$ab\leq \frac{a^2+b^2}{2}$

3. $$a^2+b^2 \Leftrightarrow a+b$$

$a^2+b^2\geq \frac{(a+b)^2}{2}$

$a+b\leq \sqrt{2(a^2+b^2)}$