| Lecture 34: April 14, 2008 |
Today we will start acquiring the methods needed to understand and effect encryption and decryption using RSA cryptography.
Since ancient times, it has been important for military and governmental purposes to be able to communicate privately over communication channels which are themselves public, for example the radio, the telegraph or the telephone, none of which are immune to listeners. For this reason, it became useful to develop ciphers, codes known only to the sender and the receiver, which enabled them to communicate without any other person understanding the message even if they could access it.
As long as there have been codes, there have been code breakers. The early, simple "substitution" codes of one letter for another, or a number for each letter, turned out to be too easy to break, using methods studying the frequency of occurrence of each letter. More and more complicated codes were developed over the century, often based on machines. As we saw in the Turing film, the Germans developed the Enigma code which Turing and his colleagues succeeded in breaking using advanced computer techniques. It seemed nearly impossible to find a code that was really immune to break-in. The code most impossible to break was organized by two people each possessing identical copies of a particular, rare book. Then, one person could transmit a message to the other which would begin with a number indicating the page of the book, then other numbers indicating which letter on that page to consult, and the letters together would spell a message.
However, this system had the defect that it was only really good to use between a very small number of people. The more people possessed the key text, the greater the risk of one of them being captured by the enemy, who would then take the book and use it to receive and transmit messages as though from the captive. Furthermore, this type of coded message did not permit the user to receive messages from anyone except those in the small circle of owners.
Twentieth century business and military practices brought a need for a
code that could be used by a large number of people to send an encrypted message
to the receiver, so that the messages could be seen by anybody, but
read and understood by nobody except himself. (For example, a business
may receive thousands of encrypted credit card numbers from clients every day.)
It seemed very difficult to come up with such an "ideal" code, but that
is exactly what happened one day in 1977, when Ron Rivest, Adi Shamir
and Leonard Adleman were chatting about the problem around a Passover
Seder table in Schenectady, NY (read pp. 227-234 of the book extract).
The result, known as
RSA cryptography,
is what we are about to study. Today, we will learn the basic algorithm
for setting up a public and private key, and using it to encrypt and decrypt.
In the following lectures, we will experiment with some letter-to-number
codes and some techniques for encrypting and decrypting more quickly even
when the numbers are very large, in order to encrypt and decrypt real
messages. We will also cover the mathematical proof of how and why RSA works.
| Lecture 35: April 16, 2008 |
Today we will do some examples of RSA encryption, laying stress on the practical aspects, namely how to best raise a large number mod n to a large power mod n when n itself is a large number. We will introduce the base-2 method for actually doing this.
Suppose T, e and n are large numbers and we want to compute Te mod n. The number Te is gigantic even compared to n. To see this, think that roughly speaking, if T is m digits long, then Te will be about em digits long. Thus, if T, e and n are all about 100 digits long, Te will be 10000 digits long and we only want its remainder mod n, which will be less than 100 digits.
The base-2 method for computing large powers mod n.
There is a way to calculate Te mod n that never uses
numbers longer than about twice the number of digits of T, that works
as follows.
(1) Write e in base 2. For this, follow the following procedure:
Compute the greatest power of 2 less than e and subtract it from e.
Call the result e1.
Compute the greatest power of 2 less than e1 and subtract it
from e1. Call the result e2.
Continue until you reach zero.
Example. Write e=443 in base 2: 28=256 is the
greatest power less than e, so e1=187.
Then, 27=128 is the greatest power less than 187, so
e2 = 59. Then, 25=32 is the greatest, so
e3 = 27. Subtracting off 24=16 leaves
e4 = 11, then subtracting off 23 leaves
e5 = 3, then subtracting off 2 leaves e6=1,
and finally we subtract off 20=1 leaving 0.
So we have proved that
(2) The next step to compute Te mod n is to compute a table of successively squared powers of T mod n:
T mod n
T2 = (T2)2 = T22 = T4 mod n
(T4)2 mod n = T23 mod n = T8 mod n,
and so on until we reach the power T2m. Thus, the numbers we compute have at most less than twice as many digits as n, because they are all squares of numbers with less digits than n.
(3) Now we consider the binary expansion of e, and multiply together the corresponding powers of T:
Note: In this procedure, we implicitly made use of a mathematical property of remainders which we state and prove clearly as fact (1) below.
Why the RSA method works. We will start on the proof now, by proving the first of the two mathematical assertions I made when giving the algorithm in the previous lecture. Namely, remember that we select a number e for our public key which has the two properties that 1<e<(p-1)(q-1) and gcd(e,(p-1)(q-1))=1. The first assertion was that there exists some number d with the two properties that 1<d<(p-1)(q-1) and the remainder of the product of the two numbers d and e, after dividing by (p-1)(q-1), is equal to 1:
Proof that such a d exists.
We need two basic facts about remainders after division by n:
(1) If a and b are two integers, then ab mod n = (a mod n)(b mod n) mod n.
In other words, the remainder after division by n of the product ab
is equal to the remainder after division by n of the product of the
remainder of a and the remainder of b.
(2) We also need Bézout's identity, which states that if a and b are
two integers, then there exist two other integers x and y such that
Proof of fact (1). Write a = kn+r, b = ln+s,
where r = a mod n is the remainder of a after division by n, and
similarly s = b mod n. Thus in particular, r and s are strictly
less than n.
We have
and two of the three terms on the right-hand side are already divisible by n, so the remainder after dividing by n should be rs. The only problem is that even though we know that r and s are both strictly less than n, the product rs might be greater than n. So the remainder is actually equal to the remainder of rs after division by n, which is just rs mod n. Putting back r = a mod n and s = b mod n, we find that
Proof of fact (2). Suppose a and b are positive
integers, and let S be the set of all strictly positive integers of the
form am+bn, where m and n can be any integers, even negative ones, as long as
the expression am+bn is strictly positive (strictly greater than 0).
Then the set S contains a smallest element. Let this element be called d,
so d=ax+by for some particular integers x,y.
Then we can write a = qd+r where r is the remainder of dividing a by d,
so in particular r is non-negative and strictly less than d.
(We are used to seeing this expression when a is greater than d. But
if a is equal to d, take q=1, r=0, and if a is less than d, take q=0, r=a.)
Then
so in fact, r is of the right form to belong to S. But r is strictly less than d, and we said that d is the smallest element of S! This seems to yield a contradiction, but in fact it doesn't, since it is legitimate for r to be equal to 0, and 0 is not in S. So we must conclude that r=0. Then a=qd, so d divides a. (In particular, we notice that d is less than or equal to a.)
Repeating the same argument for b instead of a, starting with writing
b = qd+r, we find that d also divides b. Therefore, d is a common divisor
of a and b. So far, we have proved that for any positive a, b, there
exist integers x, y such that ax+by=d, knowing that d is a common divisor
of a and b. The only thing needed to finish our proof of Bézout
is that d is the greatest common divisor. This is simple, because
if we set c = gcd(a,b), then c divides both a and b, so it divides
ax+by=d, so c divides d, so it is less than or equal to d. But since c is
the greatest common divisor and d is a common divisor, they must be equal,
so d is itself the greatest common divisor. This concludes the proof of
fact (2).
| Lecture 36: April 18, 2008 |
This lecture will be devoted to examples. In order to encode some alphabetical messages, me make use of one of the two following simple alphabet-string-to-number conversion codes:
We will work with the RSA encryption scheme (n,e)=(100097,35081).
Task 1: Encode the word "ABLE" and send it to the owner of the encryption scheme.
Answer: Using the first code, we write ABLE=12305. Call this plaintext message T. We need to compute Te mod n with e=35081, n=100097.
Step 1: We first need to express 35081 in binary. The highest power of 2 less than 35081 is 215, so the length of the binary expression will be 15+1=16. We have 35081-215=2313. The highest power of 2 less than 2313 is 11 and 2313-211=265. The highest power of 2 less than 265 is 8 and 265-28=9. The highest power of 2 less than 9 is 3 and 9-23=1=20. Thus the binary expression for e=35081 is 1000100100001001 (with 1's in the places corresponding to the 15th, 11th, 8th, 3rd and 0th powers of 2).
Step 2: We make a table of length 16, of successive squares of T mod n: (12305,66361,14806,5206,76246,18950,54561,17941,67626,44140,51592,55137,42782,25879,73711,46361).
Step 3: We multiply the 15th, 11th, 8th, 3rd and 0th elements of the table together mod n to get the final encrypted word:
Task 2: Break into the encryption scheme by finding the private key.
Step 1: Factor n into two primes. To do this, we try dividing n evenly by all primes up to the square root of n. The square root of n=100097 is about 316, so one of the prime factors of n is going to be less than or equal to 313. It is wise to move down the list of primes from the top (try first 313, then 311, then 307,...) because in general, the value of n used in encryption schemes will be products of two large primes, so it is not useful to waste time with the many small primes unless necessary. In the case n=100097, we find n=199*503. Note that in reality, there are much more efficient ways of looking for prime factors of a large number than this, however even these take too long to be practical for extremely large values of n, in the several hundred digit range.
Step 2: Compute (p-1)(q-1) (with p=199 and q=503, this value is 198*502=99396) and search for d such that de mod 99396 = 1, where e=35081 comes from the encryption scheme. To search for d, we use the following method:
Set x = 1 and test whether (p-1)(q-1)x+1 is evenly divisible by e. If not, set x = 2 and test again. Continue to increase x until you have found a value for which (p-1)(q-1)x+1 is evenly divisible by e. Then set d = ( (p-1)(q-1)x + 1 )/e mod (p-1)(q-1).
In the present case we find that when x=6, (p-1)(q-1)x+1=596377=17*35081, so d=17. We have found the private key.
Task 3: Decode the encrypted message.
This task would be the same if we were the owner of the RSA encryption scheme, already in possession of the private key, or if we have just completed task 2 by breaking in and computing the private key. Now that we have it, we decrypt the message 4685. To do this, we compute 4685^d mod n. We write 17 in binary form as 10001 and make a length 5 table of successive squares of 4685 mod n: (4685,27982,33590,94813,93690). Then we multiply the first and last elements of this table mod n:
| Lecture 37: April 21, 2008 |
In this lecture we complete the proof that RSA encryption words by proving that if we set C=Te mod n to be the cipher message, then after having found d with the property de mod (p-1)(q-1) = 1 (which exists by the proof given two lectures ago), we do indeed have Cd mod n = T for this value of d.
This proof is a case of a beautiful little result known as Fermat's Little Theorem. Fermat himself stated his little theorem for a prime number, and the statement is as follows.
Fermat's Little Theorem. Let p be a prime number, and
let a be any number between 1 and p-1. Then ap-1 mod p = 1.
Proof. Consider the set of numbers A={1,2,3,...,p-1}. Because p is
prime, the gcd's of each number in A with p is equal to 1.
Now, multiply each number in A by a, obtaining the list of numbers
Since we now have a list of p-1 distinct numbers all between 1 and p-1, this means that this list is the same as the list 1,2,...,p-1 (though in a different order). In particular, the product of these p-1 numbers will be equal to the product 1*2*3*...*(p-1) mod p. But the product of these p-1 numbers is
Another proof, using no remainders.
General statement of Fermat's little theorem:
For any positive integer n, if φ(n) denotes the number of integers k
between 1 and n-1 such that gcd(k,n)=1, and a is any number between 1 and n-1
such that gcd(a,n)=1, then aφ(n) mod n = 1.
Proof. The proof is just the same as for the case n=p above.
We take the set of numbers k with gcd(k,n)=1 and k between 1 and n-1,
and then we take the set of multiples ka mod n for each of these numbers k,
getting another list of numbers between 1 and n-1. We show that there
are really n-1 distinct numbers in the second list by contradiction:
if we assume that ka mod n = ma mod n, with k and m both between 1 and
n-1 and different from eachother, then (k-m)a mod n = 0 so
n divides (k-m)a. Since gcd(a,n)=1, n must divide (k-m), so k=ln+m,
contradicting the assumption that both k and m are between 1 and n-1
and distinct. Thus we know that the list of multiples ka mod n contains
(n-1) distinct numbers between 1 and n-1, all having gcd with n = 1,
and therefore it is the same list as the original list of k's. Taking
the product of all these numbers and comparing it to the product
of the original k's shows that a*a*...*a mod n =1, where a is multiplied by
itself as many times as there are values of k in the list. But this
number is exactly φ(n), so we end up with aφ(n) mod n = 1,
proving the general theorem.
To prove the result we need for RSA, let n=pq and count the number of integers k between 1 and n-1 such that gcd(k,n)=1. We have to take all of the (n-1) numbers between 1 and n-1, and subtract off the (q-1) numbers which are the multiples of p and the (p-1) numbers which are the multiples of q. All these numbers are distinct since the first number which is a multiple of both p and q is pq, and this is equal to n, so not in the set of integers between 1 and n-1 that we are considering. So, the total number of k between 1 and n-1 with gcd(k,n)=1 is equal to
Thus, the (p-1)(q-1) that we are considering is really the φ(n) of the generalized version of Fermat's little theorem.
Now, if T is a plaintext message, we assume that gcd(T,n)=1, i.e. that the secret large primes p and q do not divide T. (The result also works if p or q divides T, but it is such a rare case that we don't bother with the proof -- exercise?!?) Then we can use Fermat's little theorem: we have C=Te mod n, and de mod (p-1)(q-1) = 1, and so
This proves what we want, namely that decrypting C by raising it to the d-th power gives back the original message T.