Systems that are accessed by the public at any point in time are routinely subject to ingenious attack methods. You may have heard of Pollard's rho as one attack method used against security that depends cyclic integer subgroups such as DH, DSA, and El-Gamal.
While the attack is not the fastest for integers, it is a general purpose algorithm that is applicable to any cyclic group given it is abelian. These terms will be briefly explained, and then we will move onto attacking cryptosystems. Lastly we use Pollard's rho as an example to demonstrate an attack on data encryption. The objective of this post is to provide an overview of security given correct implementation and protocol and demonstrate that some given assumptions hold. This may be perhaps a 'strong' assumption as most often, encryption is crudely implemented and certain attacks are much more efficient and easily accomplished. However, this does not conflict with the goal which is to provide an example of forming attacks based on fixed assumptions.
Section 1: Cyclic Groups
Because this attack only works on certain groups, here are some of the basics about group theory. This is in no way an in depth description and only aims to provide background if the reader had not already known these. It is recommneded to go through the more detailed proofs from the references.
An (abelian) group is a set $\mathbb{G}$ along with some binary operator , or an operator that takes two inputs, that show the following characteristics [1]:
- Closure: For all $g,h \in \mathbb{G}, g \circ h \in \mathbb{G}$
- Identity: There exists $\textbf{O} \in \mathbb{G}$, such that $\forall g \in G, \textbf{O} \circ g = g$
- Inverse: $\forall g \in \mathbb{G}$, there exists an element $h \in \mathbb{G}$ such that $g \circ h = \textbf{O}$, so $h$ is the inverse of $g$.
- Associativity: $\forall g_1, g_2, g_3 \in \mathbb{G}, (g_1 \circ g_2) \circ g_3 = g_1 \circ (g_2 \circ g_3)$
- Commutativity: $\forall g,h \in \mathbb{G}, g \circ h = h \circ g$
For example, the set of integers $\mathbb{Z}$ as a group closed under addition, because any integers $a,b \in \mathbb{Z}$ results in $a + b = c$ such that $c \in \mathbb{Z}$. Naturally, the identity is 0 and the inverse is the flipped sign of any integer $a \in \mathbb{Z}$ because $a + -a = 0$. Likewise, associativity and commutivity can easily be demonstrated. Note that the set of positive integers is not an abelian group because it has no identity nor an inverse.
For cryptographic purposes however, we must have a group that exhibits cyclicity. The way to do this is from obtaining a subgroup from an existing group by way of a generator and constraining its domain by a field. This generator is an element $\boldsymbol{g} \in \mathbb{G}$ such that the group operation with some arbiturary number $\textrm{x} $ will yield every member of the subgroup $\mathbb{Z}^*$. Formally, this is $\{\forall \gamma \in \mathbb{Z}^*: \exists \textrm{x} \; | \; (\boldsymbol{g} \circ \textrm{x} = \gamma)\}$.
About generators:
- Generator $\boldsymbol{g}$ and $\gamma$ must be an element from the group $\mathbb{Z}^*$ (and therefore also in $\mathbb{Z}$) but $\textrm{x}$ need only be a valid input to the group operation.
- For security purposes, the distribution of generating $\gamma$ with $\boldsymbol{g}$ must be uniform. This can be easily met if every $\boldsymbol{g} \circ \textrm{x}$ mapped to a unique member of $\mathbb{Z}^*$.
- The number of elements in $\mathbb{Z}^*$ is the order $\#(\textrm{S}^*_p)$.
About cyclic groups:
- The field (and as a result, the elemenets of the group) is obtained by constraining all solutions of the group operation by a modulo.
- Prime modulos in particular yield useful cyclic groups.
About subgroup $\textrm{S}^*_p$ (the subgroup over modulo field $p$):
- $\mathbb{Z}_p$ : Integer group over the field of primes p >= 3.
- $\textrm{EC}^*_p$ : Elliptic curve subgroup of over the field of primes p >= 3.
- $\textrm{EC}^*_2$ : Binary Elliptic curve subgroup over the field prime p = 2.
- For integer groups over a prime field, the order is p - 1. For EC groups, they must be found by using the Lagrange theorem.
- The
Groups are important as they are fundamental to the Discrete Lograithm Problem, which is the assumption that the certain computation is intractable, $$ \boldsymbol{g} \circ \textrm{x} = y $$ Then solving for $\textrm{x}$ given generator $\boldsymbol{g}$ and output $y$.
As said before, we inherently assume that each element inside a subgroup used for cryptography has an equal probability of being generated. This means that an attacker cannot simply look at a statistical distribution after a million iterations to derive information. We apply this knowledge in the next discussion, elliptic curves before moving on to the attack.
Elliptic Curve Cryptography
We begin our discussion of Elliptical Curve Cryptography(ECC) in terms of its security and usage. To summarize, ECC is a public key cryptography that uses elliptic curve groups which consist of points on elliptical curve within a finite prime field to serve as a component in secure communication [1]. Strictly speaking, elliptic curves are solely used to form a group over finite field $\mathbb{Z}_p*$ in discrete logarithm-based protocols with prime p. Such protocols that use elliptic curve groups are Elliptical Curve Diffie-Hellman (ECDH) and Elliptical Curve Digital Signing Algorithm (ECDSA) and these have been standardized by NIST(7). CCA-secure encryption schemes have standardized under the Elliptic Curve Integrated Encryption Scheme ECIES(8).
While standards exists however, they do not prove the security of elliptic curve groups and the hardness assumption of the DLP. Our focus of this section is to gain intuition on the reasons for why ECC is considered secure, as this would give insight on possible attack vectors against elliptic curve groups.
For now, this blog assumes knowledge of basic group theory and prior knowledge of cyclic elliptic curve groups constrained by a finite prime field.
Section 2: The Elliptic Curve Discrete Logarithm Problem
It is required that we have a one-way function that depends on the Elliptic Curve Discrete Logarithm Problem (ECDLP) to have secure public-key communication. The discrete logarithm problem for exponentiation of a cyclic group of integers is “not easy” and has gone through public scrutiny for decades. Therefore, showing the ECDLP from our definition below as “hard” would be easiest by relating the two problems.
Let $A$ and $B$ denote two independent parties. $A$ chooses an elliptic curve group over a finite field $\mathbf{E} (F_p)$ discussed in the previous section, then chooses a generator base point $P = (P_x, P_y)$ that produces a subgroup $\mathbf{E} ^*(F_p)$ order $n$. Then, a point $Q = (Q_x, Q_y)$ is calculated and integer key $k \in$ $\{1,.., n-1\}$ is chosen such that $P,Q$ are elements of the same subgroup $\mathbf{E} ^*(F_p)$. Then following, $\mathbf{E} ^*(F_p)$ is also an abelian group.
Next, we will describe the situation where a tuple $(P,Q,p)$ is generated and $k$ pseudorandomly chosen by $A$, then $(P,Q,p,n)$ is sent to $B$ and $B$ must find $k$. Because point multiplication $kP$ will uniformly result in any element in the chosen subgroup $\mathbf{E}^{*}(F_p$ ), when $k$ is randomly selected, then $Q$ is also random.
We formally define this as:
Definition 1: The ECDLP is to solve $k$ given $P$ and $Q$ for $kP \equiv Q \mod p$, where $k,P,Q$ are as defined before. If $B$ cannot derive information from $(P,Q,p,n)$ and $\mathbf{E}^{*}(F_p)$ such that it increases the probability of $B$ to solve $k$ in a feasible amount of time, then the ECDLP is hard.
Given that $k$ is chosen uniformly from the set of integers less than a sufficiently large $n$, then [Definition 1] is assumed to be a hard problem.
The restriction on $k$ is if $k = n$ then we have the $k$ multiplicative identity of the group and $Q$ is not random, as well if $k$ is larger than $n$ then $Q$ collides with points generated by $k < n$. The restriction on $n$ to be large is intuitive, as we would like a large key space to defend against a brute-force attack. The security of ECC depends on this hardness assumption because if the ECC implementation is secure, then $B$ can only use an iterative algorithm to test each $k \in \{1,..,n-1\}$.
Of course, more efficient algorithms than pure iterative ones have been found for the DLP for integers [1] and ECDLP as well, which is discussed in Section 4, but we now have the intuition such that the security of ECC depends on efficient solutions for $k$ based on the ECDLP.
The weakness of our hardness assumption is there already exists types of curve where $kP$ generates a point non-uniformly or curves that have an efficient easy solution to the DLP are insecure. The few that are relevant to our Weierstrass Normal Form elliptic curves are supersingular curves which are prone to the MOV reduction [9] (which have no relation to the singular curves that were excluded from our definition of a non-singular) and the anomalous curves[9]. Identifying features of EC that create efficient ECDLP computation is vital for secure implementations.
Section 3: Protocol
ECC can be split into two parts: the protocol and the problem. In this section we will generalize this way and as well as give an example, called ECDH. In our discussion of protocols, we will assume that the proof of security of the protocol already exists, so any flaws in the secure exchange will result from the elliptic curve group component and implementation. Because of that, we will first describe the protocol and assume the security of their design before we begin an attack analysis in the next section.
Parameters
Parties must preemptively arrange parameters before secure communication is possible. The public exchange will consist of the two coefficients $a,b$ for their elliptical curve, a base point $G$, the cofactor $h$, a prime $p$, and a subgroup order $n$ denoted as domain parameters $(a, b, G, p, n, h)$[5].
The coefficients describe the elliptic curve to be used in the formula $y^2\pmod p = x^3$ $+ ax + b$. $G$ is the base point of the subgroup of order $n =\# \mathbf{E}^* (F_p)$. Prime $p$ specifies the finite field $F_p$. The order of the elliptic curve $N =\# \mathbf{E}(F_p)$ can be derived through cofactor $h$ or the elliptic curve equation, so $h$ is optionally sent. Hence, the public key consists of tuple $(a, b, G, p, n, h)$ and only a private key $d \in \{1,..,n-1\}$ is necessary to be kept secret.
ECDH
Elliptical Curve Diffie-Helman (ECDH) is the use of elliptic curve groups in the Diffie-Hellman key-exchange protocol.
Let a uniformly chosen integer $d \in \{1,…,n-1\}$ be multiplied with the base point $G \in \mathbf{E}^*(F_p)$ to compute $H = dG$ where $H \in \mathbf{E}^* (F_p)$. Because $G$ is a generator of the subgroup, $H$ will be as described in the ECDLP Section 3. Knowing $<H,G>$, would not give away information on $d$. Similarly, if we let $H'= cH$ where $c \in \{1,…,n-1 \}$, then $H'$ is also in the same subgroup as $H$ and $G$, according to our group laws.
Thus, knowing $<H',H,G>$ would not reveal information about $c$ nor $d$. This is the foundation of the DH/ECDH protocol.
Let’s explain ECDH through way of a formal example in the face of an eavesdropper. We define a key-exchange between two parties, Alice and Bob, and an eavesdropper – Eve. Alice and Bob must agree on domain parameters through a public channel where Eve can also listen and obtain them as well. Afterwards, Alice and Bob will generate their own private key $d$, and communicate a secret using the Diffie-Hellman key exchange. The exchange will be denoted as $ECDH_{Eve}^{Eav}$:
- Alice and Bob exchange parameters $(a,b,G,n,p,h)$ and each call a random number generator to obtain their own $ d \text{ where, } d_a \text{ and } d_b $ are kept secret. Both compute $H_A\, H_B$ where $H_\gamma = d_{\gamma} G$ and transmit $H_A,H_B$ through a public channel.
- Eve observes both $H_A$ and $H_B$.
- Alice and Bob receive the other's $H$ then multiply their private key to the received $H$ to compute secret $S: S=d_B(H_A)=d_A(H_B)$.
The proof for eavesdropper (as well as Chosen Plaintext Attack) security of this experiment would depend the Elliptic Curve variant of the Computational Diffie-Hellman problem that we described, such that given $H_\gamma $ the computation of $S$ is hard because of the hardness assumption of the ECDLP.
Because no information has been revealed from the public parameters, Eve must solve the discrete-logarithm problem to compute $d_A$ or $ d_B $ from any $H_{\gamma} $ and Eve cannot learn the secret $S$ in $ECDH_{Eve}^{Eav}$ in a feasible amount of time. Alice and Bob now have a point $S \in \mathbf{E}* (F_p)$ where they use, for example, $S_x$ to for future private-key communication.
Section 4: Attack
If we assume correct implementation of an elliptic curve encryption scheme, then the security is reliant on our definition of the ECDLP hardness assumption. Because the discrete logarithm has been well studied for integer subgroups, multiple algorithms already exist for attacking the DLP, and thus there are many algorithms that are more efficient than brute-force iteration through all possible values of $k$.
Proposition
Let two integers be $c,d \in \mathbb{Z}$ and two points in the same subgroup be $P,Q \in E* (F_p)$. Because $cP$ results in a point with the same subgroup as $P$, then $cP + dQ$ also produces a point within the same subgroup addition $cP + dQ = \gamma$, where the operator represents point addition. This is intuitive, as $cP,dQ \in \mathbf{E}^* (F_p)$ so this is simple point addition within the same subgroup.
The Floyd's Cycle-finding Algorithm
We formally define an equality using the earlier proposition to solve for $x$. For some arbitrary integers $p,q$ are group elements, $a,b,A,B$ are integers, and n is the order of the group and exponentiation represents the group operation. The goal of an eavesdropper is to find this equality described by Floyd [10]:
$p^a q^b \equiv p^A q^B \pmod n$ (5)
For our elliptic curve, $p,q$ are points on the elliptic curve and $a,b,A,B$ denote any integers which give us $aP+bQ \equiv AP+BQ \pmod p$. We can show our attack by way of example. Given a key exchange $ECDH_{Eve}^{Eav}$ as described in Section 3, Eve can solve $d \in \{1,..,n-1 \}$ using this equation, the properties of abelian groups as described in our Proposition.
$$ \begin{matrix} aP+bQ \equiv AP+BQ\\ aP+b(dP) \equiv AP+B(dP)\\ P(a+bd) \equiv P(A+Bd)\\ \phantom{+} a-A \equiv d(B-b)\\\ \phantom{+23456789012} d \equiv (a-A)(B-b)^{-1} & \pmod n \\ \end{matrix} $$
We take the method described by Pollard to obtain (5), as a brute force approach would instead lead to O(n^2). First, we define a pseudorandom sequence of x,y pairs such that $x,y \in \{1,..,n-1\}$. The Birthday Paradox is the reason why we "mix" the sequence, however, we shall explain this later. Then, we choose a pair of points $a,b$ at the start of the sequence and points $A,B$ right next to $a,b$. Then, we run Floyd’s cycle algorithm such that $a,b$ chooses the next $a,b$ pair of the sequence and $A,B$ choose every second pair of the sequence. Note that each subsequent item is pseudorandom. We define this algorithm $Poll_{<s>}$:
- Generate an algorithmic function $<s>$ to produce a sequence of pairs $(x,y)$ length $n$, the order of our subgroup. $<s>$ must also return the subsequent pair of a given pair $<s,(x,y)>$.
- Pseudorandomly choose a starting pair from $<s>$ for $a,b$ and $A$ calls $<s,a>$ to obtain the a's subsequent item. Similarly, $B$ calls $<s,b>$.
- Define a loop such that in each iteration:
- $aP + bQ = AP + BQ$ is tested. If true, then terminate the loop.
- Exit the loop if the number of iterations has reached $n$.
- Call $ <s, ( a,b )> $ to obtain the subsequent pair and set $a,b$ to that new pair.
- Call $ <s, ( A,B )> $ twice to obtain the second-in-sequence pair and set $A,B$ to that new pair. Increment the iteration counter.
Given that our function $<s>$ allows for a sequence to be generated without being stored, then our hypothesis is that we will have a feasible algorithm in terms of space complexity that it is seemingly constant big-O. We will investigate the running time of the algorithm later on.
Pollard's Function
We formally define our algorithmic function $<s>$ that pseudorandomly chooses the next pairs as $f(x)$ where $x$ is a seed value [10].
$$ f(x) = \begin{cases} bx \; \text{if} \; 0 \leq x < n/3 \\ x^2 \; \text{if} \; n/3 \leq x < 2n/3 \\ ax \; \text{if} \; 2n/3 \leq x < n \\ \end{cases} $$
We let $a,b$ denote randomly generator integers as before, and $x$ is always taken in the range $0 < x_i < n$. The primitive root is defined to be $n$ in this case, but we know our subgroup order is indeed a primitive root relative to a generator point-multiplied with a scalar, thus $n$ is our subgroup order.
This function was introduced by John Pollard to solve discrete logarithms so a full explanation on its behavior would be from the [12]. Through an example with the seed $x=1$, we see the next iteration will be dependent on the value of $b$, and so on. The idea is that the three possibilities are sufficiently random and the sequence is sufficiently complicated to result in random behavior that we need for our algorithm.
Birthday Paradox
The probability $Pr[No Birthday]$ of a class of $r$ students that none of the days in the 365-day course has a collision of two or more students is given by [11]:
$$Pr[No Birthday]=- \left( 1-\frac{1}{365} \right) \left( 1- \frac{2}{365} \right ) \dots \left( 1- \frac{r-1}{365} \right )$$
If each parenthetical is approximated to be $e^{-\frac{1}{365}}$ then the sequence of products yields an estimated probability $Pr[No Birthday]=e^{-\frac{r(r-1)}{2x365}}$ (because the $\sum_{r=1}^{n-1}{x} = \frac{x(x-1)}{2})$. This gives us the intuition that for a subgroup order $n$, the probability of our algorithm of obtaining $a,b,A,B$ that satisfies (5) is $ Pr[ Poll_{<s>}^{Collision} ] = e^{-\frac{r(r-1)}{2n}}$ where $r$ is the number of iterations our algorithm runs. As well, for $Pr[ Poll_{<s>}^{Collision} ] = 0.5$, then $r≈1.7741\sqrt{n}$ [11] which can be problematic to secure against. However, this is great news for an attacker as this means that the Birthday Paradox allows our algorithm to find collisions at $\mathcal{O}(\sqrt{n})$ iterations.
Analysis
Because of the Pollard’s function, we have a random sequence that induces the Birthday Paradox. While our algorithm itself has a worst case of traversing $n$ nodes (where $n$ is still the subgroup order), it has expected case of $\mathcal{O}(\sqrt{n})$. Also by defining an algorithm to generate the sequence, we have an attack that does not use an infeasible amount of storage. This is still rooted to implementation, but we can expect that correct implementations to yield $\mathcal{O}(1)$ in space complexity.
References
[0]Co-Author on Group theory: Ivankov, L.
[1] J. Katz, Introduction to modern cryptography, Second edition.. Boca Raton: CRC Press/Taylor & Francis, 2015.
[2] Nick Sullivan, “A (relatively easy to understand) primer on elliptic curve cryptography,” Ars Technica.
[3] P. W. Shor, “Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a Quantum Computer,” ArXivquant-Ph9508027, Aug. 1995.
[4] “Elliptic Curve Cryptography: a gentle introduction - Andrea Corbellini.”.
[5] M. Anoop, “Elliptic Curve Cryptography An Implementation Guide.”
[6] “Elliptic curve cryptography,” Wikipedia, the free encyclopedia. 10-Nov-2015.
[7] Federal Information Processing Standard (FIPS) 186-4, Digital Signature Standard (DSS).
[8] Certicom Research. “Standards for efficient cryptography, SEC 1: Elliptic Curve Cryptography Version 2.0”. May 2009.
[9] Juhas, Tibor. “The Use of Elliptic Curves in Cryptography”. May 2007.
[10] Batten, Lynn Margaret. “Public Key Crypto App and Attacks.” In Public Key Cryptography, Chapter 2. John Wiley & Sons, Inc., 2013.
[11] Konheim, Alan G. “Computer Security and Cryptography.” In Computer Security and Cryptography, Chapter 13. John Wiley & Sons, Inc., 2007.
[12] Pollard, J. M. “Monte Carlo Methods for Index Computation (mod p).” Mathematics of Computation 32, no. 143 (1978): 918–24. doi:10.2307/2006496.
Last updated: January 20th, 2016