This article is the final report of CSCI5440: Theory of Cryptography. I summarized the mathematical details of factoring algorithms and RSA.

Security

Preliminaries

Notations

We use ZN\mathbb{Z}_N to denote the group {0,1,...,N1}\{0,1,...,N-1\} with respect to addition modulo NN. Then we define ZN{b{1,...,N1}gcd(b,N)=1}\mathbb{Z}_N^*\triangleq\{b\in\{1,...,N-1\} | gcd(b,N)=1\}. We can proof that ZN\mathbb{Z}_N^* is a group under multiplication modulo NN:

ab[abmodN].ab\triangleq[ab\mod N].

Euler’s Phi Function

The Euler’s phi function is defined as ϕ(N)ZN\phi(N)\triangleq|\mathbb{Z}_N^*|, the order of the group ZN\mathbb{Z}_N^*. The value of ϕ(N)\phi(N) is not difficult to compute. First consider the case when N=pN=p is prime. Then all elements in {1,...,p1}\{1,...,p-1\} are relatively prime to pp, so ϕ(N)=ZN=p1\phi(N)=|\mathbb{Z}_N^*|=p-1. Next consider the case when N=pqN=pq, where p,qp,q are distinct primes. If an integer a{1,...,N1}a\in\{1,...,N-1\} is not relatively prime to NN, then either pap|a or qaq|a. The elements in {1,...,N1}\{1,...,N-1\} divisible by pp are exactly the q1q-1 elements p,2p,3p,...,(q1)pp,2p,3p,...,(q-1)p. The number of elements remaining is therefore given by (N1)(q1)(p1)=(p1)(q1)(N-1)-(q-1)-(p-1)=(p-1)(q-1).

Euler’s Theorem

Take arbitrary integer N>1N>1 and aZNa\in\mathbb{Z}_N^*. Then

aϕ(N)1modN.a^{\phi(N)}\equiv1\mod{N}.

The proof is based on the fact that for any element gg in a finite group G\mathbb{G} with group order G=m|\mathbb{G}|=m, we have gm=1g^m=1.

Integer Factorization

Integer factorization, or just factoring, is one of the oldest problems in mathematics. It has been studied for hundreds of years without an efficient algorithm being found. Based on the hardness of this problem, people constructed the famous RSA crypto-system, which is one of the most important algorithms nowadays.

Definition

Given a composite integer NN, the factoring problem is to find integers p,q>1p,q>1 such that pq=Npq=N. So far, no polynomial-time algorithm for factoring has been demonstrated despite hundreds of years of effort.

Factoring problem is believed to be hard, but not always. For example, it is easy to factor NN if it is even. Even if NN is odd, it might still have small prime factors that can be easily found. This suggests that the “hardest” numbers to factor are those having only large prime factors.

The Factoring Assumption

Let GenModulus be a polynomial-time algorithm that, on input nn, outputs (N,p,q)(N,p,q) where N=pqN=pq, and pp and qq are nn-bit primes except with probability negligible in nn. Then consider the following experiment for a given algorithm A\mathcal{A} and parameter nn:

The factoring experiment FactorA,GenModulus(n)Factor_{\mathcal{A},GenModulus}(n):

  1. Run GenModulus(n) to obtain (N,p,q)(N,p,q).
  2. A\mathcal{A} is given NN, and outputs p,q>1p',q'>1.
  3. The output of the experiment is defined to be 11 if pq=Np'\cdot q'=N, and 00 otherwise.

DEFINITION Factoring is hard relative to GenModulus if for all probabilistic polynomial-time algorithms A\mathcal{A} there exists a negligible function neglnegl such that

Pr[FactorA,GenModulus(n)=1]negl(n).Pr\left[Factor_{\mathcal{A},GenModulus}(n)=1\right]\leq negl(n).

Here, we say a function f:NRf:\mathbb{N}\rightarrow\mathbb{R}^* is negligible if for every positive polynomial pp there is an NN such that for all integers n>Nn>N it holds that f(n)<1p(n)f(n)<\frac{1}{p(n)}.

The factoring assumption is the assumption that there exists a GenModulus relative to which factoring is hard. This assumption basically says that no polynomial-time algorithm can efficiently factor a big number.

Generating Random Primes

Now let’s talk about how GenModulus(n) works. This function needs to generate two nn-bit prime numbers pp and qq. A intuitive way to generate a random nn-bit prime is to repeatedly choose random nn-bit integers until we find a prime. There are two issues we need to consider:

  1. The probability that a uniform nn-bit integer is prime.
  2. How to efficiently test whether an integer is prime.

The first issue is basically whether there is a reasonable change that a random number is prime. Luckily, mathematicians have proofed that this is the case:

THEOREM (Bertrand’s postulate) For any n>1n>1, the fraction of nn-bit integers that are prime is at least 1/3n1/3n.

This means if we iterate t=3n2t=3n^2 times, then the probability that a prime a not chosen in all iterations is negligible in nn:

(113n)t=((113n)3n)n(e1)n=en.\left(1-\frac{1}{3n}\right)^t=\left(\left(1-\frac{1}{3n}\right)^{3n}\right)^n\leq\left(e^{-1}\right)^n=e^{-n}.

For example, if we want a 512512-bit prime, we can iterate 35122=7864323\cdot512^2=786432 times. This is not a difficult task for computers.

As for the second issue, the first idea could be to factor the number. But it does not work since pp and qq are too large. The practical algorithms that people use are probabilistic. That is, if the input pp is prime, the algorithm will always output “p is prime”; if pp is composite, then the algorithm has a high probability to output “p is not prime”. Examples for these algorithms are the Fermat Primality Test, the Miller-Rabin Test, etc. Although in theory, there is a small change to make mistakes, however, no numbers are actually known that pass advanced probabilistic tests (such as Miller-Rabin) yet are actually composite.

RSA

The RSA Problem

Given a modulus NN and an integer e>2e>2 relatively prime to ϕ(N)\phi(N). If mecmodNm^e\equiv c\mod{N} (m,cZNm, c\in\mathbb{Z}_N^*), we define mc1/emodNm\triangleq c^{1/e}\mod{N}. The RSA problem is to compute [c1/emodN]\left[c^{1/e}\mod{N}\right] for a modulus NN of unknown factorization.

The RSA problem is believed to be hard for any ee that is relatively prime to ϕ(N)\phi(N), and it yields a cryptosystem.

RSA Encryption

First we need a GenRSA algorithm that, on input nn, outputs a modulus NN that is the product of two nn-bit primes, along with integers ee, dd satisfying ed1modϕ(N)ed\equiv 1\mod \phi(N). NN, pp, qq can be generated by GenModulus. ϕ(N)=(p1)(q1)\phi(N)=(p-1)(q-1). ee can be arbitrarily chosen as long as e>1e>1 and gcd(e,ϕ(N))=1\gcd(e, \phi(N))=1. dd is computed as e1modϕ(N)e^{-1}\mod \phi(N).

We define a public-key encryption scheme as follows:

  • Gen: on input nn, run GenRSA to obtain NN, ee, and dd. The public key is (N,e)(N,e) and the private key is (N,d)(N,d).
  • Enc: on input a public key PK=(N,e)PK=(N,e) and a message mZNm\in\mathbb{Z}_N^*, compute the ciphertext

c=memodN.c=m^e\mod N.

  • Dec: on input a private key SK=(N,d)SK=(N,d) and a ciphertext cZNc\in\mathbb{Z}_N^*, compute the message

m=cdmodN.m=c^d\mod N.

The RSA encryption relies on the fact that someone who knows dd can recover mm from cc by computing cdmodNc^d\mod N. This works because:

[cdmodN]=[medmodN]=[medmodϕ(N)modN]=[m1modN]=[mmodN].\begin{aligned} &\left[c^d\mod N\right]\\ =&\left[m^{ed}\mod N\right]\\ =&\left[m^{ed\mod{\phi(N)}}\mod N\right]\\ =&\left[m^1\mod N\right]\\ =&\left[m\mod N\right]. \end{aligned}

The second equation is based on the fact that for any gg in a finite group G\mathbb{G} with group order G=m>1|\mathbb{G}|=m>1 and any integer xx, we have gx=g[xmodm]g^x=g^{[x\mod m]}.

The following figure shows the encryption and decryption process. Everyone can encrypt a message mm by raising it to the eeth power. Then the person who knows the private key dd can extract the eeth root of the ciphertext cc, by simply raising cc to the power of dd.

RSA Encryption

Since dd is the inverse of ee, which can only be calculated by knowing the group order ZN=ϕ(N)|\mathbb{Z}_N^*|=\phi(N), so the core idea of RSA is that everyone can compute in group ZN\mathbb{Z}_N^*, but the group order is kept secret and only the person who knows the private key can extract the eeth root.

RSA Signature

Another use of RSA is digital signature. Signature is just the other side of encryption. In RSA encryption, we raise the eeth power to encrypt, and extract the eeth root to decrypt. In RSA signature, we extract the eeth root to sign, and raise the eeth power to verify. In this way, only the person who knows the private key can sign, but everyone can verify. The whole process is shown in the following figure.

RSA Signature

Relating the RSA and Factoring

If NN can be factored, then we know pp and qq so we can compute ϕ(N)=(p1)(q1)\phi(N)=(p-1)(q-1) and use this to compute d=e1modϕ(N)d=e^{-1}\mod \phi(N) for any given ee. In other words, if we can solve factoring problem, then we can also solve RSA problem. So if RSA is hard, then factoring must be hard. As for the other direction, it is still not known. The best we can proof is that computing dd from NN and ee is as hard as factoring. But we still can not say if factoring is hard then RSA is hard, because computing dd from NN and ee might not the only way to break RSA, although we do not know how to do it. So the best known attack for RSA still involves factoring NN.

Attacks for RSA

Trial Division (Brute-Force)

Trivially, by exhaustively checking whether 2,3,...N2,3,...\lfloor\sqrt{N}\rfloor divides NN, this problem can be solved in time O(N)\mathcal{O}\left(\sqrt{N}\right), which is exponential in N=logN+1\|N\|=\lfloor\log{N}\rfloor+1, the length of the binary representation of NN.

An interesting fact is that this naive algorithm is efficient for most of numbers. For example, by simply dividing NN by 22 we can solve half of numbers. Formally, we define p(m)p(m) as the proportion of integers that are divisible by one (or more) of the first mm prime numbers. It is easy to derive this formula:

p(m)=1k=1m(11pk).p(m)=1-\prod_{k=1}^{m}\left(1-\frac{1}{p_k}\right).

Using this formula, we get some surprising facts: about 88% of all integers have a factor less than 100, and 92% have a factor less than 1000. Therefore, if we want to construct a general-purpose factoring algorithm, we can first try trial division, if we are not lucky then we try other advanced algorithms.

Pollard’s p1p-1 Algorithm

Suppose pp, qq are prime and N=pqN=pq. Fermat’s little theorem:

ap11modp.a^{p-1} \equiv 1\mod{p}.

For all aa relatively prime to pp.
Suppose L=(p1)kL=(p-1)k, then

aL=a(p1)k1modpaL1=prgcd(aL1=pr,N=pq)=p\begin{aligned} &a^L=a^{(p-1)k} \equiv 1\mod{p}\\ \Rightarrow&a^L-1=pr\\ \Rightarrow&\gcd(a^L-1=pr, N=pq)=p \end{aligned}

This means, by simply calculating the gcd of LL and NN, we get a factor pp. There is only one question left: how to find such an LL?

From fundamental theorem of arithmetic, LL can be written as a product of prime numbers:

L=(p1)k=p1e1p2e2p3e3...L=(p-1)k=p_1^{e_1}\cdot p_2^{e_2}\cdot p_3^{e_3}...

In this equation, some prime numbers correspond to p1p-1, the other correspond to kk. Now we try to evaluate LL by multiplying as much prime numbers as we can:

L=p1e1p2e2p3e3...L'=p_1^{e_1'}\cdot p_2^{e_2'}\cdot p_3^{e_3'}...

Here we also make the exponents sufficiently large. If p1p-1 has only ``small’’ prime factors, then LL' includes all the prime numbers that correspond to p1p-1. So LL' is a multiple of p1p-1, now we have found an LL that we need:

L=(p1)kL'=(p-1)k'

Pollard’s p1p-1 algorithm is thwarted if pp is a safe prime. (Recall that pp is a safe prime if (p1)/2(p-1)/2 is also prime, which means p1p-1 has a very large prime factor.)

Pollard’s Rho Algorithm

Pollard’s Rho algorithm is based on the birthday attack, which tells the probability that two students have the same birthday. Formally, we randomly choose kk numbers x1,...,xkx_1,...,x_k in range [1,365][1,365], then what is the probability that there exist distinct i,ji,j with xi=xjx_i=x_j? The conclusion of birthday attack is that, the probability is high if k=365k=\sqrt{365}.

Pollard’s Rho algorithm is almost the same. We randomly choose kk numbers x1,...,xkx_1,...,x_k in ZN\mathbb{Z}_N^*, what is the probability that there exist distinct x,xx, x' with x=xmodpx=x'\mod p? (p,qp,q are nn-bits primes, N=pqN=pq) The conclusion is also similar: the probability is high if k=2n/2=O(p)k=2^{n/2}=\mathcal{O}(\sqrt{p}). Finally, if x=xmodpx=x'\mod p then gcd(xx=pr,N=pq)=p\gcd(x-x'=pr,N=pq)=p. The complexity is also exponential: O(N1/4)\mathcal{O}(N^{1/4}).

Other Algorithms and the Complexity of Factoring Problem

We have talked about three algorithms but all of them are still exponential. Can we do better? In fact, the complexity of factoring problem is still not known. People always represent the heuristic running time using the LL-notation:

Ln[u,v]=ev(logn)u(loglogn)1u.L_n[u,v]=e^{v(\log n)^u(\log\log n)^{1-u}}.

If u=0u=0, then Ln[0,v]=(logn)vL_n[0,v]=(\log n)^v, which is polynomial. If u=1u=1, then Ln[1,v]=(elogn)vL_n[1,v]=(e^{\log n})^v, which is exponential in the length of nn. If 0<u<10<u<1, then it is sub-exponential. Now people have proposed some algorithms that are sub-exponential, elliptic curve method and quadratic sieve algorithm can achieve Ln[1/2,1+o(1)]L_n[1/2,1+o(1)]. The fastest known general-purpose factoring algorithm is the general number field sieve, whose expected running time is Ln[1/3,64/93]L_n[1/3, \sqrt[3]{64/9}].