on
Useful Inequalities
- 1 Useful Inequality Scaling
- 2 Triangle Inequality
- 3 Cauchy–Schwarz Inequality
- 4 Markov Inequality
- 5 Chebyshev Inequality
- 6 Chernoff Inequality
- 7 Hoeffding Inequality
- 8 Hölder’s Inequality
- 9 Jensen’s Inequality
- 10 Minkowski Inequality
- 11 Union Bound
- 12 Bernoulli’s Inequality
- 13 Inequality of arithmetic and geometric means
1 Useful Inequality Scaling #
Some of these bounds are trivial, but nevertheless useful…
Exponential to polynomial #
\[e^x \geq x+1\]
\[\ln (1+x)\leq x\]
\(e^x\leq 1+x+x^2, \forall x\leq1\)
Min and max #
\(\max\{x_1,x_2, \dots\} \leq \sum_{i} x_i ,\forall x_i\geq0\)
\(\min\{x_1,x_2, \dots, x_n\} \leq \frac{x_1+x_2+ \dots +x_n}{n} \leq \max\{x_1,x_2, \dots, x_n\}\)
For \(x_i\geq0\), \(\min\{x_1, \dots, x_n\}^{n} \leq \prod_{i=1}^{n} x_i \leq \max\{x_1, \dots, x_n\}^{n}\)
2 Triangle Inequality #
General statement #
In a normed vector space \(V\):
\[\|x+y\|\leq \|x\|+\|y\|\]
This is actually a defining property of the norm.
Generalization: Minkowski Inequality.
Absolute value as norm #
\[|x+y|\leq |x| + |y|\]
\[|x-y|\leq |x| + |y|\]
Reverse triangle inequality #
\[\bigg| |x|-|y| \bigg|\le|x-y|\]
3 Cauchy–Schwarz Inequality #
General statement #
For all vectors \(\vec{u}\) and \(\vec{v}\) of an inner product space:
\[\left|\langle \vec{u},\vec{v}\rangle \right|^2 \leq \langle \vec{u},\vec{u}\rangle \cdot \langle \vec{v},\vec{v}\rangle \]
or written in a more familiar form:
\[\left|\langle \vec{u},\vec{v}\rangle \right| \leq \| \vec{u}\| \| \vec{v}\| \]
The two sides are equal if and only if \(\vec{u}\) and \(\vec{v}\) are linearly dependent.
Generalization: Hölder’s Inequality
In \(\RR^2\) #
If \(\vec{v}=(v_1, v_2)\) and \(\vec{u}=(u_1,u_2)\):
\[(u_1v_1+u_2v_2)^2 \leq (u_1^2+u_2^2)(v_1^2+v_2^2)\]
In \(\RR ^n\) #
\[\left( \sum_{i=1}^n u_i v_i \right)^2 \leq \left( \sum_{i=1}^n u_i^2 \right)\left( \sum_{i=1}^n v_i^2 \right)\]
Sometimes, there is a non-negative constant \(c_i\geq 0\):
\[\left( \sum_{i=1}^n c_i u_i v_i \right)^2 \leq \left( \sum_{i=1}^n c_iu_i^2 \right)\left( \sum_{i=1}^n c_iv_i^2 \right)\]
Expectation version #
\[\E[|XY|]\leq \sqrt{\E \left[ X^2 \right] \cdot \E \left[ Y^2 \right]}\]
4 Markov Inequality #
For a random variable \(X\), the more information we know about it, the better bounds we can achieve for \(\Pr[X\geq t]\). Suppose we only know \(\E[X]\):
If \(X\geq 0\) and \(t>0\), then:
\[\Pr[X\geq t]\leq \frac{\E[X]}{t}\]
The intuition is easy: if \(X\) is smaller than \(t\), just assume it is 0. If \(X\) is bigger than \(t\), just assume it is \(t\), so we can give the proof sketch:
\[\E[X] = \Pr[X<t]\cdot\E[X|X<t]+\Pr[X\geq t]\cdot\E[X|X\geq t]\]
And \(\E[X|X<t]>0\) because \(X\geq 0\), \(\E[X|X\geq t]>t\) because the conditional expectation only takes into account of values larger than \(t\). Hence,
\[\E[X]\geq \Pr[X\geq t]\cdot\E[X|X\geq t]\geq t\cdot \Pr[X\geq t]\]
Proof
\(a \1_{X\geq a} \leq X\) holds with probability 1. Take expectation on both sides.
5 Chebyshev Inequality #
Suppose we now know both the expectation \(\E[X]\) and the variance \(\Var[X]\):
Let \(\E[X]=\mu\) and \(\Var[X]=\sigma^2\neq 0\), then for all \(t>0\):
\[\Pr[|X-\mu|\geq t]\leq \frac{\sigma^2}{t^2}\]
The intuition is similar with Markov Inequality: if \(|X-\mu|\) is smaller than \(t\sigma\), just assume it is 0. If \(|X-\mu|\) is bigger than \(t\sigma\), just assume it is \(t\sigma\). The proof sketch:
\[\begin{aligned}\sigma^2 &= \E[(X-\mu)^2]\\ &= \E \left[ (X-\mu)^2 \mid |X-\mu|\geq t\sigma \right] \Pr \left[ |X-\mu|\geq t\sigma \right] + \E \left[ (X-\mu)^2 \mid |X-\mu| < t\sigma \right] \Pr \left[ |X-\mu| < t\sigma \right]\\ &\geq (t\sigma)^2\Pr \left[ |X-\mu|\geq t\sigma \right] + 0 \cdot \Pr \left[ |X-\mu| < t\sigma \right] \\ &= t^2\sigma^2\Pr \left[ |X-\mu| < t\sigma \right] \end{aligned} \]
Chebyshev inequality follows by dividing by \(t^2\sigma^2\).
6 Chernoff Inequality #
Even though Markov Inequality and Chebyshev Inequality only use information about the expectation and the variance of the random variable under consideration, they are essentially tight for a general random variable. i.e., can construct non-trivial (non-constant) random variables for which these two inequalities are tight (hold with equality).
To get a better bound, we need more information, so let’s consider the sum of a number of independent random variables.
Suppose \(X=X_1+X_2+\dots +X_n\) and \(X_i\) are independent, \(\mu = \E[X] = \sum_i \E[X_i]\).
General Bound #
Applying Markov Inequality to \(e^{tX}\) for every \(t>0\):
\[\Pr[X\geq a]=\Pr \left[ e^{tX}\geq e^{ta} \right] \leq \frac{\E \left[ e^{tX} \right]}{e^{ta}} = e^{-ta}\E \left[ \prod_{i=1}^n e^{tX_i} \right]\]
Optimizing over \(t\) and since \(X_i\) are independent, we have:
\[\Pr[X\geq a]\leq \min_{t>0}e^{-ta}\prod_{i=1}^n \E \left[ e^{tX_i} \right]\]
Similarly, we have:
\[\Pr[X\leq a]\leq \min_{t>0}e^{ta}\prod_{i=1}^n \E \left[ e^{-tX_i} \right]\]
Specific Chernoff bounds are attained by calculating \(\E \left[ e^{-tX_i} \right]\) for specific instances of \(X_i\).
Bernoulli version of Chernoff Bound #
Let \(X_i=1\) with probability \(p_i\) and \(X_i=0\) with probability \(1-p_i\). \(\mu = \E[X] = \sum_{i=1}^n p_i\), then
-
Upper Tail: for all \(\delta>0\), \[\Pr[X\geq(1+\delta)\mu] \leq e^{-\frac{\delta^2}{2+\delta}\mu}\]
-
Lower Tail: for all \(0<\delta<1\), \[\Pr[X\leq(1-\delta)\mu] \leq e^{-\frac{\delta^2}{2}\mu}\]
For \(0<\delta<1\), we can also combine the lower and upper tails:
\[\Pr \left[ |X-\mu|\geq \delta\mu \right] \leq 2e^{-\frac{\delta^2}{3}\mu}\]
Moment Generation Version #
Let \(X\) be a random variable with \(M(t)=\E[e^{tX}]\), then for any \(a>0\),
\[\Pr[X\geq a]\leq e^{-at} M(t), t >0\]
\[\Pr[X\leq -a]\leq e^{-at} M(t), t <0\]
Proof
For \(t>0\),
\[\Pr[X\geq a] = \Pr[e^{tX} \geq e^{ta}] \leq \frac{\E[e^{tX}]}{e^{ta}} = e^{-at} M(t)\]
7 Hoeffding Inequality #
Hoeffding’s inequality is a generalization of the Chernoff Bound, sometimes called Chernoff-Hoeffding bound.
Suppose \(X_1, \dots, X_n\) are independent and \(a_i\leq X_i\leq b_i\), \(\mu=\E[X]\), then for all \(t>0\),
\[\Pr[X\geq \mu +t]\leq \exp \left( -\frac{2t^2}{\sum_{i=1}^n (b_i-a_i)^2} \right)\]
Usually we consider the following special case. Let \(X_1, \dots, X_n\) be random variables with common support \([0, 1]\) and \(\E[X_i]=\mu\). Let \(S_n = X_1 + \dots + X_n\). Then for all \(t\geq0\),
\[\Pr\left[S_n\geq n \mu+t\right] \leq e^{-\frac{2t^2}{n}}\]
and
\[\Pr\left[S_n\leq n \mu-t\right] \leq e^{-\frac{2t^2}{n}}\]
So
\[\Pr\left[|S_n-n\mu|<t\right]\geq 1-2 e^{-\frac{2t^2}{n}}\]
Note
Equivalently, we can write:
\[\Pr\left[\overline{X}_n\geq \mu+ \varepsilon\right] \leq e^{-2n\varepsilon^2}\]
and
\[\Pr\left[\overline{X}_n \leq \mu- \varepsilon\right] \leq e^{-2n\varepsilon^2}\]
So
\[\Pr\left[|\overline{X}_n-\mu|<\epsilon\right]\geq 1-2 e^{-2n\varepsilon^2}\]
This is generalized by Azuma’s Inequality.
8 Hölder’s Inequality #
Hölder’s inequality is a generalization of the Cauchy–Schwarz Inequality.
Let \(\frac{1}{p}+\frac{1}{q}=1\) with \(p,q>1\) (not necessarily integers), note that when \(p=q=2\), this is equivalent to Cauchy–Schwarz Inequality.
General version #
\[\|fg\|_1 \leq \|f\|_p \|g\|_q\]
where
\[\|x\|_p = \left( \sum_{k=1}^n |x_k|^p \right)^{\frac{1}{p}}\]
Integral version #
\[\int_a^b |f(x)g(x)| dx \leq \left[ \int_a^b |f(x)|^p dx \right]^{\frac{1}{p}} \left[ \int_a^b |g(x)|^q dx \right]^{\frac{1}{q}}\]
with equality when \(|g(x)| = c|f(x)|^{p-1}\).
Sum version #
\[\sum_{k=1}^n |x_k y_k| \leq \left( \sum_{k=1}^n |x_k|^p \right)^{\frac{1}{p}} \left( \sum_{k=1}^n |y_k|^q \right)^{\frac{1}{q}}\]
with equality when \(|y_k| = c|x_k|^{p-1}\).
Expectation version #
\[\E[|XY|]\leq \left(\E[|X|^p] \right)^{\frac{1}{p}} \cdot \left(\E[|Y|^q] \right)^{\frac{1}{q}}\]
Proof
Denote \(\|X\|_p = \left( \E[|X|^p] \right)^{\frac{1}{p}}\), and \(\|Y\|_q = \left( \E[|Y|^q] \right)^{\frac{1}{q}}\).
Assume \(0< \|X\|_p, \|Y\|_q < \infty\), without loss of generality, we can only consider \(\|X\|_p = \|Y\|_q=1\), otherwise, scale \(X,Y\) to \(\frac{X}{\|X\|_p}\), \(\frac{Y}{\|Y\|_q}\).
Claim that
\[xy\leq \frac{x^p}{p} + \frac{y^q}{q}, \forall x,y \geq 0\]
So we have
\[\E[|XY|] \leq \frac{\E \left[ |X|^p \right]}{p}+\frac{\E \left[ |Y|^q \right]}{q} = \frac{1}{p} + \frac{1}{q} = 1\]
Proof of claim:
\[\begin{split} &xy\leq \frac{x^p}{p} + \frac{y^q}{q}\\ \Longleftrightarrow& \left( x^p \right)^{\frac{1}{p}} \cdot \left( y^q \right)^{\frac{1}{q}} \leq \frac{x^p}{p} + \frac{y^q}{q}\\ \Longleftrightarrow& \frac{1}{p} \log(x^p) + \frac{1}{q} \log(y^q) \leq \log \left( \frac{x^p}{p} + \frac{y^q}{q} \right) \end{split} \] This follows from Jensen’s Inequality, since \(\log\) is concave.
(We can replace numbers with R.V.s in the claimed inequality, because it holds for all \(\omega \in \Omega\).)
9 Jensen’s Inequality #
Jensen’s inequality relates the value of a convex function of an integral to the integral of the convex function.
Finite version #
Let \(f\) be a convex function on interval \(I\), then
\[f(\lambda_1x_1+\dots + \lambda_n x_n)\leq \lambda_1f(x_1)+\dots +\lambda_n f(x_n)\]
where \(x_1,x_2, \dots ,x_n \in I\), \(\lambda_1, \dots ,\lambda_n \geq 0\), and \(\lambda_1 +\dots +\lambda_n = 1\).
Expectation version #
Let \(f\) be a convex function, and \(\E[f(X)]\) and \(f(\E[X])\) are finite, then
\[\E[f(X)]\geq f(\E[X])\]
If \(f\) is concave, then \(\E[f(X)]\leq f(\E[X])\).
Example
\[\E\left[|X|\right]\geq |\E[X]|\] \[\E[X^2] \geq \left( E[X] \right)^2\] \[\E[\log X] \leq \log(E[X])\] \[\E[e^X]\geq e^{\E[X]}\]
Proof
Let \(c=\E[X]\). Find \(a,b\) such that \(a+bc=f( c)\) and \(a+bx\leq f(x)\).
This is always doable, because convex function \(f\) always lies above a line that passes through point \((x,f(x))\). Then we have:
\(\E[f(X)] \geq \E[a+bX]=a+bc=f(\E[X])\)
Proof
Assume \(f\) is differentiable, denote \(\mu=\E[X]\), \(f(X) \geq f(\mu) + f’(\mu)(X-\mu)\).
\[\E[f(X)]\geq f(\mu)+0\]
10 Minkowski Inequality #
Minkowski inequality is a generalization of Triangle Inequality.
General version #
For \(1\leq p<\infty\), we have:
\[\|f+g\|_p \leq \|f\|_p + \|g\|_p\]
Sum version #
\[\left( \sum_{k=1}^n |x_k+y_k|^p \right)^{\frac{1}{p}} \leq \left( \sum_{k=1}^n |x_k|^p \right)^{\frac{1}{p}} + \left( \sum_{k=1}^n |y_k|^p \right)^{\frac{1}{p}} \]
Expectation version #
\[\left( \E \left[ |X+Y|^p \right] \right)^{\frac{1}{p}} \leq \left( \E \left[ |X|^p \right] \right)^{\frac{1}{p}} + \left( \E \left[ |Y|^p \right] \right)^{\frac{1}{p}}\]
Proof
Let \(q\) be such that \(\frac{1}{p}+\frac{1}{q}=1\). We have
\[\begin{split} \left(\E \left[ |X+Y|^p \right]\right)^{\frac{1}{p}} &= \left( \E[|X+Y| |X+Y|^{p-1}] \right)^{\frac{1}{p}}\\ &\leq \left( \E[|X| |X+Y|^{p-1}] + \E[|Y| |X+Y|^{p-1}] \right)^{\frac{1}{p}}\\ &\stackrel{\text{Hölder}}{\leq} \left[ (\E[|X|^p])^{\frac{1}{p}} (\E[|X+Y|^{(p-1)q}])^{\frac{1}{q}} + (\E[|Y|^p])^{\frac{1}{p}} (\E[|X+Y|^{(p-1)q}])^{\frac{1}{q}}\right]^{\frac{1}{p}}\\ &= \left( \|X\|_p + \|Y\|_p \right)^{\frac{1}{p}} \left( \E[|X+Y|^p] \right)^{\frac{1}{pq}} \end{split} \]
Then solve this recursive inequality and get the result.
11 Union Bound #
For a countable set of events \(A_1, A_2, A_3, \dots \), we have:
\[\Pr \left[ \bigcup_i A_i \right] \leq \sum_i \Pr[A_i]\]
12 Bernoulli’s Inequality #
Approximate exponentiations of \(1+x\):
For every integer \(n \geq 0\) and real number \(x\geq -1\):
\[(1+x)^n \geq 1+nx\]
If \(n\geq 0\) and is even, then this inequality holds for all \(x \in \RR \).
13 Inequality of arithmetic and geometric means #
General version #
For \(x_1,x_2, \dots ,x_n \in \NN\), “调几算方”:调和平均数 ≤ 几何平均数 ≤ 算术平均数 ≤ 平方平均数(方均根)
\[H_n\leq G_n\leq A_n\leq Q_n\]
with equality holds if and only if \(x_1=x_2=\dots =x_n\).
where:
\[H_n=\frac{n}{\sum_{i=1}^n \frac{1}{x_i}} = \frac{n}{\frac{1}{x_1}+\frac{1}{x_2}+\dots +\frac{1}{x_n}}\]
\[G_n = \sqrt[n]{\prod_{i=1}^n x_i} = \sqrt[n]{x_1x_2 \dots x_3}\]
\[A_n = \frac{\sum_{i=1}^n x_i}{n} = \frac{x_1+x_2+\dots +x_n}{n}\]
\[Q_n = \sqrt{\frac{\sum_{i=1}^n x_i^2}{n}} = \sqrt{\frac{x_1^2+x_2^2+\dots +x_n^2}{n}}\]
When \(n=2\) #
\[\frac{2}{\frac{1}{a}+\frac{1}{b}} \leq \sqrt{ab}\leq \frac{a+b}{2}\leq \sqrt{\frac{a^2+b^2}{2}}\]
Usage #
-
\(a+b \Leftrightarrow ab\)
\[a+b\geq 2\sqrt{ab}\]
\[ab\leq \left( \frac{a+b}{2} \right)^2\]
-
\(a^2+b^2 \Leftrightarrow ab\)
\[a^2+b^2\geq 2ab\]
\[ab\leq \frac{a^2+b^2}{2}\]
-
\(a^2+b^2 \Leftrightarrow a+b\)
\[a^2+b^2\geq \frac{(a+b)^2}{2}\]
\[a+b\leq \sqrt{2(a^2+b^2)}\]