Useful Inequalities

\( \renewcommand{\vec}[1]{\boldsymbol{#1}} \DeclareMathOperator*{\E}{\mathbb{E}} \DeclareMathOperator*{\Var}{\mathrm{Var}} \DeclareMathOperator*{\Cov}{\mathrm{Cov}} \DeclareMathOperator*{\argmin}{\mathrm{arg\,min\;}} \DeclareMathOperator*{\argmax}{\mathrm{arg\,max\;}} \def\ZZ{{\mathbb Z}} \def\NN{{\mathbb N}} \def\RR{{\mathbb R}} \def\CC{{\mathbb C}} \def\QQ{{\mathbb Q}} \def\FF{{\mathbb FF}} \def\EE{{\mathbb E}} \newcommand{\tr}{{\rm tr}} \newcommand{\sign}{{\rm sign}} \newcommand{\1}{𝟙} \newcommand{\inprod}[2]{\left\langle #1, #2 \right\rangle} \newcommand{\set}[1]{\left\{#1\right\}} \require{physics} \)

1 Useful Inequality Scaling #

Some of these bounds are trivial, but nevertheless useful…

Exponential to polynomial #

\[e^x \geq x+1\]

\[\ln (1+x)\leq x\]

\(e^x\leq 1+x+x^2, \forall x\leq1\)

Min and max #

\(\max\{x_1,x_2, \dots\} \leq \sum_{i} x_i ,\forall x_i\geq0\)

\(\min\{x_1,x_2, \dots, x_n\} \leq \frac{x_1+x_2+ \dots +x_n}{n} \leq \max\{x_1,x_2, \dots, x_n\}\)

For \(x_i\geq0\), \(\min\{x_1, \dots, x_n\}^{n} \leq \prod_{i=1}^{n} x_i \leq \max\{x_1, \dots, x_n\}^{n}\)

2 Triangle Inequality #

General statement #

In a normed vector space \(V\):

\[\|x+y\|\leq \|x\|+\|y\|\]

This is actually a defining property of the norm.

Generalization: Minkowski Inequality.

Absolute value as norm #

\[|x+y|\leq |x| + |y|\]

\[|x-y|\leq |x| + |y|\]

Reverse triangle inequality #

\[\bigg| |x|-|y| \bigg|\le|x-y|\]

3 Cauchy–Schwarz Inequality #

General statement #

For all vectors \(\vec{u}\) and \(\vec{v}\) of an inner product space:

\[\left|\langle \vec{u},\vec{v}\rangle \right|^2 \leq \langle \vec{u},\vec{u}\rangle \cdot \langle \vec{v},\vec{v}\rangle \]

or written in a more familiar form:

\[\left|\langle \vec{u},\vec{v}\rangle \right| \leq \| \vec{u}\| \| \vec{v}\| \]

The two sides are equal if and only if \(\vec{u}\) and \(\vec{v}\) are linearly dependent.

Generalization: Hölder’s Inequality

In \(\RR^2\) #

If \(\vec{v}=(v_1, v_2)\) and \(\vec{u}=(u_1,u_2)\):

\[(u_1v_1+u_2v_2)^2 \leq (u_1^2+u_2^2)(v_1^2+v_2^2)\]

In \(\RR ^n\) #

\[\left( \sum_{i=1}^n u_i v_i \right)^2 \leq \left( \sum_{i=1}^n u_i^2 \right)\left( \sum_{i=1}^n v_i^2 \right)\]

Sometimes, there is a non-negative constant \(c_i\geq 0\):

\[\left( \sum_{i=1}^n c_i u_i v_i \right)^2 \leq \left( \sum_{i=1}^n c_iu_i^2 \right)\left( \sum_{i=1}^n c_iv_i^2 \right)\]

Expectation version #

\[\E[|XY|]\leq \sqrt{\E \left[ X^2 \right] \cdot \E \left[ Y^2 \right]}\]

4 Markov Inequality #

For a random variable \(X\), the more information we know about it, the better bounds we can achieve for \(\Pr[X\geq t]\). Suppose we only know \(\E[X]\):

If \(X\geq 0\) and \(t>0\), then:

\[\Pr[X\geq t]\leq \frac{\E[X]}{t}\]

The intuition is easy: if \(X\) is smaller than \(t\), just assume it is 0. If \(X\) is bigger than \(t\), just assume it is \(t\), so we can give the proof sketch:

\[\E[X] = \Pr[X<t]\cdot\E[X|X<t]+\Pr[X\geq t]\cdot\E[X|X\geq t]\]

And \(\E[X|X<t]>0\) because \(X\geq 0\), \(\E[X|X\geq t]>t\) because the conditional expectation only takes into account of values larger than \(t\). Hence,

\[\E[X]\geq \Pr[X\geq t]\cdot\E[X|X\geq t]\geq t\cdot \Pr[X\geq t]\]


\(a \1_{X\geq a} \leq X\) holds with probability 1. Take expectation on both sides.

5 Chebyshev Inequality #

Suppose we now know both the expectation \(\E[X]\) and the variance \(\Var[X]\):

Let \(\E[X]=\mu\) and \(\Var[X]=\sigma^2\neq 0\), then for all \(t>0\):

\[\Pr[|X-\mu|\geq t]\leq \frac{\sigma^2}{t^2}\]

The intuition is similar with Markov Inequality: if \(|X-\mu|\) is smaller than \(t\sigma\), just assume it is 0. If \(|X-\mu|\) is bigger than \(t\sigma\), just assume it is \(t\sigma\). The proof sketch:

\[\begin{aligned}\sigma^2 &= \E[(X-\mu)^2]\\ &= \E \left[ (X-\mu)^2 \mid |X-\mu|\geq t\sigma \right] \Pr \left[ |X-\mu|\geq t\sigma \right] + \E \left[ (X-\mu)^2 \mid |X-\mu| < t\sigma \right] \Pr \left[ |X-\mu| < t\sigma \right]\\ &\geq (t\sigma)^2\Pr \left[ |X-\mu|\geq t\sigma \right] + 0 \cdot \Pr \left[ |X-\mu| < t\sigma \right] \\ &= t^2\sigma^2\Pr \left[ |X-\mu| < t\sigma \right] \end{aligned} \]

Chebyshev inequality follows by dividing by \(t^2\sigma^2\).

6 Chernoff Inequality #

Even though Markov Inequality and Chebyshev Inequality only use information about the expectation and the variance of the random variable under consideration, they are essentially tight for a general random variable. i.e., can construct non-trivial (non-constant) random variables for which these two inequalities are tight (hold with equality).

To get a better bound, we need more information, so let’s consider the sum of a number of independent random variables.

Suppose \(X=X_1+X_2+\dots +X_n\) and \(X_i\) are independent, \(\mu = \E[X] = \sum_i \E[X_i]\).

General Bound #

Applying Markov Inequality to \(e^{tX}\) for every \(t>0\):

\[\Pr[X\geq a]=\Pr \left[ e^{tX}\geq e^{ta} \right] \leq \frac{\E \left[ e^{tX} \right]}{e^{ta}} = e^{-ta}\E \left[ \prod_{i=1}^n e^{tX_i} \right]\]

Optimizing over \(t\) and since \(X_i\) are independent, we have:

\[\Pr[X\geq a]\leq \min_{t>0}e^{-ta}\prod_{i=1}^n \E \left[ e^{tX_i} \right]\]

Similarly, we have:

\[\Pr[X\leq a]\leq \min_{t>0}e^{ta}\prod_{i=1}^n \E \left[ e^{-tX_i} \right]\]

Specific Chernoff bounds are attained by calculating \(\E \left[ e^{-tX_i} \right]\) for specific instances of \(X_i\).

Bernoulli version of Chernoff Bound #

Let \(X_i=1\) with probability \(p_i\) and \(X_i=0\) with probability \(1-p_i\). \(\mu = \E[X] = \sum_{i=1}^n p_i\), then

For \(0<\delta<1\), we can also combine the lower and upper tails:

\[\Pr \left[ |X-\mu|\geq \delta\mu \right] \leq 2e^{-\frac{\delta^2}{3}\mu}\]

Moment Generation Version #

Let \(X\) be a random variable with \(M(t)=\E[e^{tX}]\), then for any \(a>0\),

\[\Pr[X\geq a]\leq e^{-at} M(t), t >0\]

\[\Pr[X\leq -a]\leq e^{-at} M(t), t <0\]


For \(t>0\),

\[\Pr[X\geq a] = \Pr[e^{tX} \geq e^{ta}] \leq \frac{\E[e^{tX}]}{e^{ta}} = e^{-at} M(t)\]

7 Hoeffding Inequality #

Hoeffding’s inequality is a generalization of the Chernoff Bound, sometimes called Chernoff-Hoeffding bound.

Suppose \(X_1, \dots, X_n\) are independent and \(a_i\leq X_i\leq b_i\), \(\mu=\E[X]\), then for all \(t>0\),

\[\Pr[X\geq \mu +t]\leq \exp \left( -\frac{2t^2}{\sum_{i=1}^n (b_i-a_i)^2} \right)\]

Usually we consider the following special case. Let \(X_1, \dots, X_n\) be random variables with common support \([0, 1]\) and \(\E[X_i]=\mu\). Let \(S_n = X_1 + \dots + X_n\). Then for all \(t\geq0\),

\[\Pr\left[S_n\geq n \mu+t\right] \leq e^{-\frac{2t^2}{n}}\]


\[\Pr\left[S_n\leq n \mu-t\right] \leq e^{-\frac{2t^2}{n}}\]


\[\Pr\left[|S_n-n\mu|<t\right]\geq 1-2 e^{-\frac{2t^2}{n}}\]


Equivalently, we can write:

\[\Pr\left[\overline{X}_n\geq \mu+ \varepsilon\right] \leq e^{-2n\varepsilon^2}\]


\[\Pr\left[\overline{X}_n \leq \mu- \varepsilon\right] \leq e^{-2n\varepsilon^2}\]


\[\Pr\left[|\overline{X}_n-\mu|<\epsilon\right]\geq 1-2 e^{-2n\varepsilon^2}\]

This is generalized by Azuma’s Inequality.

8 Hölder’s Inequality #

Hölder’s inequality is a generalization of the Cauchy–Schwarz Inequality.

Let \(\frac{1}{p}+\frac{1}{q}=1\) with \(p,q>1\) (not necessarily integers), note that when \(p=q=2\), this is equivalent to Cauchy–Schwarz Inequality.

General version #

\[\|fg\|_1 \leq \|f\|_p \|g\|_q\]


\[\|x\|_p = \left( \sum_{k=1}^n |x_k|^p \right)^{\frac{1}{p}}\]

Integral version #

\[\int_a^b |f(x)g(x)| dx \leq \left[ \int_a^b |f(x)|^p dx \right]^{\frac{1}{p}} \left[ \int_a^b |g(x)|^q dx \right]^{\frac{1}{q}}\]

with equality when \(|g(x)| = c|f(x)|^{p-1}\).

Sum version #

\[\sum_{k=1}^n |x_k y_k| \leq \left( \sum_{k=1}^n |x_k|^p \right)^{\frac{1}{p}} \left( \sum_{k=1}^n |y_k|^q \right)^{\frac{1}{q}}\]

with equality when \(|y_k| = c|x_k|^{p-1}\).

Expectation version #

\[\E[|XY|]\leq \left(\E[|X|^p] \right)^{\frac{1}{p}} \cdot \left(\E[|Y|^q] \right)^{\frac{1}{q}}\]


Denote \(\|X\|_p = \left( \E[|X|^p] \right)^{\frac{1}{p}}\), and \(\|Y\|_q = \left( \E[|Y|^q] \right)^{\frac{1}{q}}\).

Assume \(0< \|X\|_p, \|Y\|_q < \infty\), without loss of generality, we can only consider \(\|X\|_p = \|Y\|_q=1\), otherwise, scale \(X,Y\) to \(\frac{X}{\|X\|_p}\), \(\frac{Y}{\|Y\|_q}\).

Claim that

\[xy\leq \frac{x^p}{p} + \frac{y^q}{q}, \forall x,y \geq 0\]

So we have

\[\E[|XY|] \leq \frac{\E \left[ |X|^p \right]}{p}+\frac{\E \left[ |Y|^q \right]}{q} = \frac{1}{p} + \frac{1}{q} = 1\]

Proof of claim:

\[\begin{split} &xy\leq \frac{x^p}{p} + \frac{y^q}{q}\\ \Longleftrightarrow& \left( x^p \right)^{\frac{1}{p}} \cdot \left( y^q \right)^{\frac{1}{q}} \leq \frac{x^p}{p} + \frac{y^q}{q}\\ \Longleftrightarrow& \frac{1}{p} \log(x^p) + \frac{1}{q} \log(y^q) \leq \log \left( \frac{x^p}{p} + \frac{y^q}{q} \right) \end{split} \] This follows from Jensen’s Inequality, since \(\log\) is concave.

(We can replace numbers with R.V.s in the claimed inequality, because it holds for all \(\omega \in \Omega\).)

9 Jensen’s Inequality #

Jensen’s inequality relates the value of a convex function of an integral to the integral of the convex function.

Finite version #

Let \(f\) be a convex function on interval \(I\), then

\[f(\lambda_1x_1+\dots + \lambda_n x_n)\leq \lambda_1f(x_1)+\dots +\lambda_n f(x_n)\]

where \(x_1,x_2, \dots ,x_n \in I\), \(\lambda_1, \dots ,\lambda_n \geq 0\), and \(\lambda_1 +\dots +\lambda_n = 1\).

Expectation version #

Let \(f\) be a convex function, and \(\E[f(X)]\) and \(f(\E[X])\) are finite, then

\[\E[f(X)]\geq f(\E[X])\]

If \(f\) is concave, then \(\E[f(X)]\leq f(\E[X])\).


\[\E\left[|X|\right]\geq |\E[X]|\] \[\E[X^2] \geq \left( E[X] \right)^2\] \[\E[\log X] \leq \log(E[X])\] \[\E[e^X]\geq e^{\E[X]}\]


Let \(c=\E[X]\). Find \(a,b\) such that \(a+bc=f( c)\) and \(a+bx\leq f(x)\).

This is always doable, because convex function \(f\) always lies above a line that passes through point \((x,f(x))\). Then we have:

\(\E[f(X)] \geq \E[a+bX]=a+bc=f(\E[X])\)


Assume \(f\) is differentiable, denote \(\mu=\E[X]\), \(f(X) \geq f(\mu) + f’(\mu)(X-\mu)\).

\[\E[f(X)]\geq f(\mu)+0\]

10 Minkowski Inequality #

Minkowski inequality is a generalization of Triangle Inequality.

General version #

For \(1\leq p<\infty\), we have:

\[\|f+g\|_p \leq \|f\|_p + \|g\|_p\]

Sum version #

\[\left( \sum_{k=1}^n |x_k+y_k|^p \right)^{\frac{1}{p}} \leq \left( \sum_{k=1}^n |x_k|^p \right)^{\frac{1}{p}} + \left( \sum_{k=1}^n |y_k|^p \right)^{\frac{1}{p}} \]

Expectation version #

\[\left( \E \left[ |X+Y|^p \right] \right)^{\frac{1}{p}} \leq \left( \E \left[ |X|^p \right] \right)^{\frac{1}{p}} + \left( \E \left[ |Y|^p \right] \right)^{\frac{1}{p}}\]


Let \(q\) be such that \(\frac{1}{p}+\frac{1}{q}=1\). We have

\[\begin{split} \left(\E \left[ |X+Y|^p \right]\right)^{\frac{1}{p}} &= \left( \E[|X+Y| |X+Y|^{p-1}] \right)^{\frac{1}{p}}\\ &\leq \left( \E[|X| |X+Y|^{p-1}] + \E[|Y| |X+Y|^{p-1}] \right)^{\frac{1}{p}}\\ &\stackrel{\text{Hölder}}{\leq} \left[ (\E[|X|^p])^{\frac{1}{p}} (\E[|X+Y|^{(p-1)q}])^{\frac{1}{q}} + (\E[|Y|^p])^{\frac{1}{p}} (\E[|X+Y|^{(p-1)q}])^{\frac{1}{q}}\right]^{\frac{1}{p}}\\ &= \left( \|X\|_p + \|Y\|_p \right)^{\frac{1}{p}} \left( \E[|X+Y|^p] \right)^{\frac{1}{pq}} \end{split} \]

Then solve this recursive inequality and get the result.

11 Union Bound #

For a countable set of events \(A_1, A_2, A_3, \dots \), we have:

\[\Pr \left[ \bigcup_i A_i \right] \leq \sum_i \Pr[A_i]\]

12 Bernoulli’s Inequality #

Approximate exponentiations of \(1+x\):

For every integer \(n \geq 0\) and real number \(x\geq -1\):

\[(1+x)^n \geq 1+nx\]

If \(n\geq 0\) and is even, then this inequality holds for all \(x \in \RR \).

13 Inequality of arithmetic and geometric means #

General version #

For \(x_1,x_2, \dots ,x_n \in \NN\), “调几算方”:调和平均数 ≤ 几何平均数 ≤ 算术平均数 ≤ 平方平均数(方均根)

\[H_n\leq G_n\leq A_n\leq Q_n\]

with equality holds if and only if \(x_1=x_2=\dots =x_n\).


\[H_n=\frac{n}{\sum_{i=1}^n \frac{1}{x_i}} = \frac{n}{\frac{1}{x_1}+\frac{1}{x_2}+\dots +\frac{1}{x_n}}\]

\[G_n = \sqrt[n]{\prod_{i=1}^n x_i} = \sqrt[n]{x_1x_2 \dots x_3}\]

\[A_n = \frac{\sum_{i=1}^n x_i}{n} = \frac{x_1+x_2+\dots +x_n}{n}\]

\[Q_n = \sqrt{\frac{\sum_{i=1}^n x_i^2}{n}} = \sqrt{\frac{x_1^2+x_2^2+\dots +x_n^2}{n}}\]

When \(n=2\) #

\[\frac{2}{\frac{1}{a}+\frac{1}{b}} \leq \sqrt{ab}\leq \frac{a+b}{2}\leq \sqrt{\frac{a^2+b^2}{2}}\]

Usage #

  1. \(a+b \Leftrightarrow ab\)

    \[a+b\geq 2\sqrt{ab}\]

    \[ab\leq \left( \frac{a+b}{2} \right)^2\]

  2. \(a^2+b^2 \Leftrightarrow ab\)

    \[a^2+b^2\geq 2ab\]

    \[ab\leq \frac{a^2+b^2}{2}\]

  3. \(a^2+b^2 \Leftrightarrow a+b\)

    \[a^2+b^2\geq \frac{(a+b)^2}{2}\]

    \[a+b\leq \sqrt{2(a^2+b^2)}\]