- Integer Factorization
- Attacks for RSA
This article is the final report of CSCI5440: Theory of Cryptography. I summarized the mathematical details of factoring algorithms and RSA.
We use to denote the group with respect to addition modulo . Then we define . We can proof that is a group under multiplication modulo :
The Euler’s phi function is defined as , the order of the group . The value of is not difficult to compute. First consider the case when is prime. Then all elements in are relatively prime to , so . Next consider the case when , where are distinct primes. If an integer is not relatively prime to , then either or . The elements in divisible by are exactly the elements . The number of elements remaining is therefore given by .
Take arbitrary integer and . Then
The proof is based on the fact that for any element in a finite group with group order , we have .
Integer factorization, or just factoring, is one of the oldest problems in mathematics. It has been studied for hundreds of years without an efficient algorithm being found. Based on the hardness of this problem, people constructed the famous RSA crypto-system, which is one of the most important algorithms nowadays.
Given a composite integer , the factoring problem is to find integers such that . So far, no polynomial-time algorithm for factoring has been demonstrated despite hundreds of years of effort.
Factoring problem is believed to be hard, but not always. For example, it is easy to factor if it is even. Even if is odd, it might still have small prime factors that can be easily found. This suggests that the “hardest” numbers to factor are those having only large prime factors.
Let GenModulus be a polynomial-time algorithm that, on input , outputs where , and and are -bit primes except with probability negligible in . Then consider the following experiment for a given algorithm and parameter :
The factoring experiment :
- Run GenModulus(n) to obtain .
- is given , and outputs .
- The output of the experiment is defined to be if , and otherwise.
DEFINITION Factoring is hard relative to GenModulus if for all probabilistic polynomial-time algorithms there exists a negligible function such that
Here, we say a function is negligible if for every positive polynomial there is an such that for all integers it holds that .
The factoring assumption is the assumption that there exists a GenModulus relative to which factoring is hard. This assumption basically says that no polynomial-time algorithm can efficiently factor a big number.
Now let’s talk about how GenModulus(n) works. This function needs to generate two -bit prime numbers and . A intuitive way to generate a random -bit prime is to repeatedly choose random -bit integers until we find a prime. There are two issues we need to consider:
- The probability that a uniform -bit integer is prime.
- How to efficiently test whether an integer is prime.
The first issue is basically whether there is a reasonable change that a random number is prime. Luckily, mathematicians have proofed that this is the case:
THEOREM (Bertrand’s postulate) For any , the fraction of -bit integers that are prime is at least .
This means if we iterate times, then the probability that a prime a not chosen in all iterations is negligible in :
For example, if we want a -bit prime, we can iterate times. This is not a difficult task for computers.
As for the second issue, the first idea could be to factor the number. But it does not work since and are too large. The practical algorithms that people use are probabilistic. That is, if the input is prime, the algorithm will always output “p is prime”; if is composite, then the algorithm has a high probability to output “p is not prime”. Examples for these algorithms are the Fermat Primality Test, the Miller-Rabin Test, etc. Although in theory, there is a small change to make mistakes, however, no numbers are actually known that pass advanced probabilistic tests (such as Miller-Rabin) yet are actually composite.
Given a modulus and an integer relatively prime to . If (), we define . The RSA problem is to compute for a modulus of unknown factorization.
The RSA problem is believed to be hard for any that is relatively prime to , and it yields a cryptosystem.
First we need a GenRSA algorithm that, on input , outputs a modulus that is the product of two -bit primes, along with integers , satisfying . , , can be generated by GenModulus. . can be arbitrarily chosen as long as and . is computed as .
We define a public-key encryption scheme as follows:
- Gen: on input , run GenRSA to obtain , , and . The public key is and the private key is .
- Enc: on input a public key and a message , compute the ciphertext
- Dec: on input a private key and a ciphertext , compute the message
The RSA encryption relies on the fact that someone who knows can recover from by computing . This works because:
The second equation is based on the fact that for any in a finite group with group order and any integer , we have .
The following figure shows the encryption and decryption process. Everyone can encrypt a message by raising it to the th power. Then the person who knows the private key can extract the th root of the ciphertext , by simply raising to the power of .
Since is the inverse of , which can only be calculated by knowing the group order , so the core idea of RSA is that everyone can compute in group , but the group order is kept secret and only the person who knows the private key can extract the th root.
Another use of RSA is digital signature. Signature is just the other side of encryption. In RSA encryption, we raise the th power to encrypt, and extract the th root to decrypt. In RSA signature, we extract the th root to sign, and raise the th power to verify. In this way, only the person who knows the private key can sign, but everyone can verify. The whole process is shown in the following figure.
If can be factored, then we know and so we can compute and use this to compute for any given . In other words, if we can solve factoring problem, then we can also solve RSA problem. So if RSA is hard, then factoring must be hard. As for the other direction, it is still not known. The best we can proof is that computing from and is as hard as factoring. But we still can not say if factoring is hard then RSA is hard, because computing from and might not the only way to break RSA, although we do not know how to do it. So the best known attack for RSA still involves factoring .
Trivially, by exhaustively checking whether divides , this problem can be solved in time , which is exponential in , the length of the binary representation of .
An interesting fact is that this naive algorithm is efficient for most of numbers. For example, by simply dividing by we can solve half of numbers. Formally, we define as the proportion of integers that are divisible by one (or more) of the first prime numbers. It is easy to derive this formula:
Using this formula, we get some surprising facts: about 88% of all integers have a factor less than 100, and 92% have a factor less than 1000. Therefore, if we want to construct a general-purpose factoring algorithm, we can first try trial division, if we are not lucky then we try other advanced algorithms.
Suppose , are prime and . Fermat’s little theorem:
For all relatively prime to .
Suppose , then
This means, by simply calculating the gcd of and , we get a factor . There is only one question left: how to find such an ?
From fundamental theorem of arithmetic, can be written as a product of prime numbers:
In this equation, some prime numbers correspond to , the other correspond to . Now we try to evaluate by multiplying as much prime numbers as we can:
Here we also make the exponents sufficiently large. If has only ``small’’ prime factors, then includes all the prime numbers that correspond to . So is a multiple of , now we have found an that we need:
Pollard’s algorithm is thwarted if is a safe prime. (Recall that is a safe prime if is also prime, which means has a very large prime factor.)
Pollard’s Rho algorithm is based on the birthday attack, which tells the probability that two students have the same birthday. Formally, we randomly choose numbers in range , then what is the probability that there exist distinct with ? The conclusion of birthday attack is that, the probability is high if .
Pollard’s Rho algorithm is almost the same. We randomly choose numbers in , what is the probability that there exist distinct with ? ( are -bits primes, ) The conclusion is also similar: the probability is high if . Finally, if then . The complexity is also exponential: .
We have talked about three algorithms but all of them are still exponential. Can we do better? In fact, the complexity of factoring problem is still not known. People always represent the heuristic running time using the -notation:
If , then , which is polynomial. If , then , which is exponential in the length of . If , then it is sub-exponential. Now people have proposed some algorithms that are sub-exponential, elliptic curve method and quadratic sieve algorithm can achieve . The fastest known general-purpose factoring algorithm is the general number field sieve, whose expected running time is .