Math 170: Ideas in Mathematics

Encryption and coding


 Lecture 34: April 14, 2008

Today we will start acquiring the methods needed to understand and effect encryption and decryption using RSA cryptography.

Since ancient times, it has been important for military and governmental purposes to be able to communicate privately over communication channels which are themselves public, for example the radio, the telegraph or the telephone, none of which are immune to listeners. For this reason, it became useful to develop ciphers, codes known only to the sender and the receiver, which enabled them to communicate without any other person understanding the message even if they could access it.

As long as there have been codes, there have been code breakers. The early, simple "substitution" codes of one letter for another, or a number for each letter, turned out to be too easy to break, using methods studying the frequency of occurrence of each letter. More and more complicated codes were developed over the century, often based on machines. As we saw in the Turing film, the Germans developed the Enigma code which Turing and his colleagues succeeded in breaking using advanced computer techniques. It seemed nearly impossible to find a code that was really immune to break-in. The code most impossible to break was organized by two people each possessing identical copies of a particular, rare book. Then, one person could transmit a message to the other which would begin with a number indicating the page of the book, then other numbers indicating which letter on that page to consult, and the letters together would spell a message.

However, this system had the defect that it was only really good to use between a very small number of people. The more people possessed the key text, the greater the risk of one of them being captured by the enemy, who would then take the book and use it to receive and transmit messages as though from the captive. Furthermore, this type of coded message did not permit the user to receive messages from anyone except those in the small circle of owners.

Twentieth century business and military practices brought a need for a code that could be used by a large number of people to send an encrypted message to the receiver, so that the messages could be seen by anybody, but read and understood by nobody except himself. (For example, a business may receive thousands of encrypted credit card numbers from clients every day.) It seemed very difficult to come up with such an "ideal" code, but that is exactly what happened one day in 1977, when Ron Rivest, Adi Shamir and Leonard Adleman were chatting about the problem around a Passover Seder table in Schenectady, NY (read pp. 227-234 of the book extract). The result, known as RSA cryptography, is what we are about to study. Today, we will learn the basic algorithm for setting up a public and private key, and using it to encrypt and decrypt. In the following lectures, we will experiment with some letter-to-number codes and some techniques for encrypting and decrypting more quickly even when the numbers are very large, in order to encrypt and decrypt real messages. We will also cover the mathematical proof of how and why RSA works.

 Lecture 35: April 16, 2008

Today we will do some examples of RSA encryption, laying stress on the practical aspects, namely how to best raise a large number mod n to a large power mod n when n itself is a large number. We will introduce the base-2 method for actually doing this.

Suppose T, e and n are large numbers and we want to compute Te mod n. The number Te is gigantic even compared to n. To see this, think that roughly speaking, if T is m digits long, then Te will be about em digits long. Thus, if T, e and n are all about 100 digits long, Te will be 10000 digits long and we only want its remainder mod n, which will be less than 100 digits.

The base-2 method for computing large powers mod n.

There is a way to calculate Te mod n that never uses numbers longer than about twice the number of digits of T, that works as follows.
(1) Write e in base 2. For this, follow the following procedure:
Compute the greatest power of 2 less than e and subtract it from e. Call the result e1.
Compute the greatest power of 2 less than e1 and subtract it from e1. Call the result e2.
Continue until you reach zero.
Example. Write e=443 in base 2: 28=256 is the greatest power less than e, so e1=187. Then, 27=128 is the greatest power less than 187, so e2 = 59. Then, 25=32 is the greatest, so e3 = 27. Subtracting off 24=16 leaves e4 = 11, then subtracting off 23 leaves e5 = 3, then subtracting off 2 leaves e6=1, and finally we subtract off 20=1 leaving 0.

So we have proved that

433 = 28+27+25+24 +23+21+20.
We write this as a binary number (only 1's and 0's) of length m+1, where m is the highest power of 2 less than e, i.e. the first power in the list. Here, m=8 so the binary number will be of length 9. We put a 1 if the power of 2 appears and a 0 if it doesn't appear, so 443 = 110111011, where the powers go from left to right, descending from m down to 0.

(2) The next step to compute Te mod n is to compute a table of successively squared powers of T mod n:

T mod n

T2 = (T2)2 = T22 = T4 mod n

(T4)2 mod n = T23 mod n = T8 mod n,

and so on until we reach the power T2m. Thus, the numbers we compute have at most less than twice as many digits as n, because they are all squares of numbers with less digits than n.

(3) Now we consider the binary expansion of e, and multiply together the corresponding powers of T:

Te = T2m+...+21+20,
namely only those powers of T which appear in the binary expansion of e. Each time we multiply a new one onto our product, we take the result mod n and in this way we again never deal with numbers that get too large.

Note: In this procedure, we implicitly made use of a mathematical property of remainders which we state and prove clearly as fact (1) below.

Why the RSA method works. We will start on the proof now, by proving the first of the two mathematical assertions I made when giving the algorithm in the previous lecture. Namely, remember that we select a number e for our public key which has the two properties that 1<e<(p-1)(q-1) and gcd(e,(p-1)(q-1))=1. The first assertion was that there exists some number d with the two properties that 1<d<(p-1)(q-1) and the remainder of the product of the two numbers d and e, after dividing by (p-1)(q-1), is equal to 1:

de mod (p-1)(q-1) = 1.

Proof that such a d exists.

We need two basic facts about remainders after division by n:
(1) If a and b are two integers, then ab mod n = (a mod n)(b mod n) mod n. In other words, the remainder after division by n of the product ab is equal to the remainder after division by n of the product of the remainder of a and the remainder of b.
(2) We also need Bézout's identity, which states that if a and b are two integers, then there exist two other integers x and y such that

ax + by = gcd(a,b).
Accepting (1) and (2) for the moment, in our situation, we plug in e for a, (p-1)(q-1) for b, and we know by the definition of e that the gcd of the two numbers is 1, so Bézout's identity (2) above tells us that there exist integers x and y such that ex+(p-1)(q-1)y = 1. These two number are equal, so the two remainders after dividing by n are also equal. The right-hand side is just 1, so the remainder is also 1. The (p-1)(q-1)y term disappears from the left-hand side because it is divisible by (p-1)(q-1), so the left-hand side becomes equal to ex mod (p-1)(q-1) and it is equal to 1. Let d = x mod (p-1)(q-1). Then by fact (1) above, we have
ex mod (p-1)(q-1) = (e mod (p-1)(q-1)) (x mod (p-1)(q-1)) mod (p-1)(q-1) = ed mod (p-1)(q-1) = 1.

Proof of fact (1). Write a = kn+r, b = ln+s, where r = a mod n is the remainder of a after division by n, and similarly s = b mod n. Thus in particular, r and s are strictly less than n.

We have

ab mod n = (kn+r)(ln+s) mod n = kln2 + (ks+lr)n + rs mod n,

and two of the three terms on the right-hand side are already divisible by n, so the remainder after dividing by n should be rs. The only problem is that even though we know that r and s are both strictly less than n, the product rs might be greater than n. So the remainder is actually equal to the remainder of rs after division by n, which is just rs mod n. Putting back r = a mod n and s = b mod n, we find that

ab mod n = (a mod n)(b mod n) mod n
as claimed.

Proof of fact (2). Suppose a and b are positive integers, and let S be the set of all strictly positive integers of the form am+bn, where m and n can be any integers, even negative ones, as long as the expression am+bn is strictly positive (strictly greater than 0). Then the set S contains a smallest element. Let this element be called d, so d=ax+by for some particular integers x,y.

Then we can write a = qd+r where r is the remainder of dividing a by d, so in particular r is non-negative and strictly less than d. (We are used to seeing this expression when a is greater than d. But if a is equal to d, take q=1, r=0, and if a is less than d, take q=0, r=a.) Then

r = a - qd = a - q(ax+by) = (1-qx)a + (-qy)b,

so in fact, r is of the right form to belong to S. But r is strictly less than d, and we said that d is the smallest element of S! This seems to yield a contradiction, but in fact it doesn't, since it is legitimate for r to be equal to 0, and 0 is not in S. So we must conclude that r=0. Then a=qd, so d divides a. (In particular, we notice that d is less than or equal to a.)

Repeating the same argument for b instead of a, starting with writing b = qd+r, we find that d also divides b. Therefore, d is a common divisor of a and b. So far, we have proved that for any positive a, b, there exist integers x, y such that ax+by=d, knowing that d is a common divisor of a and b. The only thing needed to finish our proof of Bézout is that d is the greatest common divisor. This is simple, because if we set c = gcd(a,b), then c divides both a and b, so it divides ax+by=d, so c divides d, so it is less than or equal to d. But since c is the greatest common divisor and d is a common divisor, they must be equal, so d is itself the greatest common divisor. This concludes the proof of fact (2).

 Lecture 36: April 18, 2008

This lecture will be devoted to examples. In order to encode some alphabetical messages, me make use of one of the two following simple alphabet-string-to-number conversion codes:

A,B,...,I=1,2,...,9, J,K,...,R=10,20,...,90, S,T,...,Z=100,200,...,800
or
A,B,...,I=1,2,...,9, J,K,...,R=11,12,...,19, S,T,...,Z=21,22,...,28 (and set a 0 between each converted letter).
For instance, in the first of these codes, the word ABLE becomes 12305, and in the second, it becomes 10201305.

We will work with the RSA encryption scheme (n,e)=(100097,35081).

Task 1: Encode the word "ABLE" and send it to the owner of the encryption scheme.

Answer: Using the first code, we write ABLE=12305. Call this plaintext message T. We need to compute Te mod n with e=35081, n=100097.

Step 1: We first need to express 35081 in binary. The highest power of 2 less than 35081 is 215, so the length of the binary expression will be 15+1=16. We have 35081-215=2313. The highest power of 2 less than 2313 is 11 and 2313-211=265. The highest power of 2 less than 265 is 8 and 265-28=9. The highest power of 2 less than 9 is 3 and 9-23=1=20. Thus the binary expression for e=35081 is 1000100100001001 (with 1's in the places corresponding to the 15th, 11th, 8th, 3rd and 0th powers of 2).

Step 2: We make a table of length 16, of successive squares of T mod n: (12305,66361,14806,5206,76246,18950,54561,17941,67626,44140,51592,55137,42782,25879,73711,46361).

Step 3: We multiply the 15th, 11th, 8th, 3rd and 0th elements of the table together mod n to get the final encrypted word:

12305*5206*67626*55137*46361 mod n = 4685.
The number 4685 is the encrypted message.

Task 2: Break into the encryption scheme by finding the private key.

Step 1: Factor n into two primes. To do this, we try dividing n evenly by all primes up to the square root of n. The square root of n=100097 is about 316, so one of the prime factors of n is going to be less than or equal to 313. It is wise to move down the list of primes from the top (try first 313, then 311, then 307,...) because in general, the value of n used in encryption schemes will be products of two large primes, so it is not useful to waste time with the many small primes unless necessary. In the case n=100097, we find n=199*503. Note that in reality, there are much more efficient ways of looking for prime factors of a large number than this, however even these take too long to be practical for extremely large values of n, in the several hundred digit range.

Step 2: Compute (p-1)(q-1) (with p=199 and q=503, this value is 198*502=99396) and search for d such that de mod 99396 = 1, where e=35081 comes from the encryption scheme. To search for d, we use the following method:

Set x = 1 and test whether (p-1)(q-1)x+1 is evenly divisible by e. If not, set x = 2 and test again. Continue to increase x until you have found a value for which (p-1)(q-1)x+1 is evenly divisible by e. Then set d = ( (p-1)(q-1)x + 1 )/e mod (p-1)(q-1).

In the present case we find that when x=6, (p-1)(q-1)x+1=596377=17*35081, so d=17. We have found the private key.

Task 3: Decode the encrypted message.

This task would be the same if we were the owner of the RSA encryption scheme, already in possession of the private key, or if we have just completed task 2 by breaking in and computing the private key. Now that we have it, we decrypt the message 4685. To do this, we compute 4685^d mod n. We write 17 in binary form as 10001 and make a length 5 table of successive squares of 4685 mod n: (4685,27982,33590,94813,93690). Then we multiply the first and last elements of this table mod n:

4683*93690 mod 100097 = 12305.
This is the decrypted numerical message, which we put back into the alphabetical form ABLE using the alphabet-to-number code above.

 Lecture 37: April 21, 2008

In this lecture we complete the proof that RSA encryption words by proving that if we set C=Te mod n to be the cipher message, then after having found d with the property de mod (p-1)(q-1) = 1 (which exists by the proof given two lectures ago), we do indeed have Cd mod n = T for this value of d.

This proof is a case of a beautiful little result known as Fermat's Little Theorem. Fermat himself stated his little theorem for a prime number, and the statement is as follows.

Fermat's Little Theorem. Let p be a prime number, and let a be any number between 1 and p-1. Then ap-1 mod p = 1.

Proof. Consider the set of numbers A={1,2,3,...,p-1}. Because p is prime, the gcd's of each number in A with p is equal to 1. Now, multiply each number in A by a, obtaining the list of numbers

a,2a,3a,...,(p-1)a.
Take the remainder of each of these numbers after division by p, obtaining the list of numbers
a mod p, 2a mod p, 3a mod p,...,(p-1)a mod p.
Because each of these numbers is a remainder after division by p, they all lie between 0 and p-1. But in fact, none of these remainders can be equal to 0, because gcd(a,p)=1, so p cannot divide a multiple ma if m is less than p, since then also gcd(m,p)=1. Thus, we have a list of p-1 numbers
a mod p, 2a mod p, 3a mod p,...,(p-1)a mod p
which are all between 1 and p-1. Now let us prove by contradiction that no two numbers in this list can be equal. Suppose the contrary, that in fact we have ia mod p = ja mod p with both i and j less than p, but i different from j. Then we would have ia-ja mod p = (i-j)a mod p = 0. This means that the remainder of (i-j)a after division by p is zero, so in fact (i-j)a must be divisible by p. Now, we know that gcd(a,p)=1, so p does not divide a, so p must divide (i-j), say i-j=kp. But this means i = kp+j, so i = j mod p, so in fact i=j since they are both between 1 and p-1 to begin with. This is contrary to assumption, so we have a contradiction, and in fact the p-1 numbers in the list above are distinct.

Since we now have a list of p-1 distinct numbers all between 1 and p-1, this means that this list is the same as the list 1,2,...,p-1 (though in a different order). In particular, the product of these p-1 numbers will be equal to the product 1*2*3*...*(p-1) mod p. But the product of these p-1 numbers is

(a)*(2a)*(3a)*...*((p-1)a) = (1*2*3*...*(p-1))*ap-1,
so since this is equal to 1*2*3*...*(p-1), we conclude that ap-1 mod p = 1. The theorem is proved.

Another proof, using no remainders.

General statement of Fermat's little theorem: For any positive integer n, if φ(n) denotes the number of integers k between 1 and n-1 such that gcd(k,n)=1, and a is any number between 1 and n-1 such that gcd(a,n)=1, then aφ(n) mod n = 1.

Proof. The proof is just the same as for the case n=p above. We take the set of numbers k with gcd(k,n)=1 and k between 1 and n-1, and then we take the set of multiples ka mod n for each of these numbers k, getting another list of numbers between 1 and n-1. We show that there are really n-1 distinct numbers in the second list by contradiction: if we assume that ka mod n = ma mod n, with k and m both between 1 and n-1 and different from eachother, then (k-m)a mod n = 0 so n divides (k-m)a. Since gcd(a,n)=1, n must divide (k-m), so k=ln+m, contradicting the assumption that both k and m are between 1 and n-1 and distinct. Thus we know that the list of multiples ka mod n contains (n-1) distinct numbers between 1 and n-1, all having gcd with n = 1, and therefore it is the same list as the original list of k's. Taking the product of all these numbers and comparing it to the product of the original k's shows that a*a*...*a mod n =1, where a is multiplied by itself as many times as there are values of k in the list. But this number is exactly φ(n), so we end up with aφ(n) mod n = 1, proving the general theorem.

To prove the result we need for RSA, let n=pq and count the number of integers k between 1 and n-1 such that gcd(k,n)=1. We have to take all of the (n-1) numbers between 1 and n-1, and subtract off the (q-1) numbers which are the multiples of p and the (p-1) numbers which are the multiples of q. All these numbers are distinct since the first number which is a multiple of both p and q is pq, and this is equal to n, so not in the set of integers between 1 and n-1 that we are considering. So, the total number of k between 1 and n-1 with gcd(k,n)=1 is equal to

φ(n)=(n-1)-(p-1)-(q-1) = pq-p-q+1=(p-1)(q-1).

Thus, the (p-1)(q-1) that we are considering is really the φ(n) of the generalized version of Fermat's little theorem.

Now, if T is a plaintext message, we assume that gcd(T,n)=1, i.e. that the secret large primes p and q do not divide T. (The result also works if p or q divides T, but it is such a rare case that we don't bother with the proof -- exercise?!?) Then we can use Fermat's little theorem: we have C=Te mod n, and de mod (p-1)(q-1) = 1, and so

Cd mod n = Tde mod n = T1+x(p-1)(q-1) mod n = T*Tφ(n)x mod n = T*(Tφ(n))x mod n = T mod n.

This proves what we want, namely that decrypting C by raising it to the d-th power gives back the original message T.