PASSWORD PARTITIONING

In this section we show how to perform password partition attacks, using the information obtained from our timing and cache attacks. This enables an adversary to recover the password of a target. 8.1 Partitioning a Dictionary In the first attack variant, our goal is to recover the password from a given dictionary. We accomplish this by repeatedly partitioning the dictionary into correct and incorrect password candidates. Practically, this is implemented by removing incorrect passwords from the dictionary during each partitioning step. If the dictionary becomes empty, this means the target’s password was not in it. However, if after the partitioning steps only one password remains, then with high probability this is the target’s password. The result of every element test that is performed in a (passworddependent) iteration of Listing 1 or 2 can be used to partition the dictionary. We use the term element test to refer to both the quadratic residue test for elliptic curves, and the if-test that checks whether the prime of the MODP group is bigger than the hash output. Recall that with one timing measurement against the MODP algorithm, we learn on average the result of multiple (failed) element tests. Considering element tests separately also has the advantage that, if for example we are unsure whether a spoofed address resulted in 4 or 5 iterations, this info still enables us to determine that the first three element tests failed. By representing our timing and cache attack measurements as a set of element tests, we can now use the same partitioning attack algorithm in both attack scenarios. The algorithm illustrated in Listing 3 implements the password partitioning algorithm. As input it receives a dictionary, the set of element tests and their result, and the MAC address of the target. The algorithm uses this information to partition the dictionary by removing passwords that lead to a different result for the element test compared to the result that we measuring this the timing or cache-based attack. More importantly, this algorithm can be run offline, i.e., without requiring any interactions with the target device. 8.2 Prerequisites and Success Analysis To determine the performance of the password partitioning algorithm, we first calculate how many element tests are required to uniquely recover the password with high probability. Note that every element test is independent, because in each iteration the hash inputs are different, resulting in independent hash outputs. Let pe denote the probability that the group element is not found, meaning another iteration and element test has to be performed. For the elliptic curve algorithm, pe is close to 50%, and for MODP groups the values for pe are listed in in Table 3 under Pr[value ≥ p]. We want to know the probability of eliminating d incorrect passwords, when given the result of n element tests. Let Z denote a random variable that equals the number of element tests that are required to eliminate d incorrect passwords. This means that if a dictionary of size d + 1 contains the correct password, and we use n element tests, the probability of uniquely recovering the password is Pr[Z ≤ n]. To derive this probability, we first introduce random variable Y as being the number of element tests where the password element was found. We have: Pr[Y = k] =  n k  · (1 − pe ) k · p n−k e (2) This is because the result of an element test does not eliminate an incorrect password when the incorrect password has the same result under the given MAC addresses and iteration count. For example, if the real password in a given iteration did not have a quadratic residue, and the incorrect password also did not, then the results of this element test does not eliminate the password. Given that in k out of n measured element tests the password element was found, the probability that all element tests do not eliminate a random password equals (1 −pe ) k · p n−k e . Now let random variable E denote the number of eliminated passwords. The probability that all d incorrect passwords are eliminated equals: Pr[E = d | Y = k] =  1 − (1 − pe ) k · p n−k e d (3) Finally, given the result of n random element tests, the probability that all d incorrect password are eliminated equals: Pr[Z ≤ n] = Õn k=0 Pr[Y = k] · Pr[E = d | Y = k] (4) We tested the above formula by running 100 000 runs of the partitioning algorithm on 1 000 passwords. Each run used random simulated element test results (i.e. simulated timing measurements). Our experimental results closely matched that of formula 4. By trying various values for n with the RockYou password dump, we find that for MODP group 22, having n ≥ 35 element tests gives us a probability above 95% of uniquely recovering the correct password. On average, we need to perform 35/1.44 = 24.3 timing measurements to obtain 35 element tests (recall Table 3). For elliptic curve P-256, an adversary needs to obtain 29 element test results to uniquely recover the password with a probability above 95%. Given that our cache-based side-channel attack can detect a QR with 100% accuracy, and a non-QR with 99.5% accuracy, the probability of that on average all used non-QR measurement are correct equals 0.99512.5 = 0.939. The probability of uniquely recovering the password then becomes at least 0.95 · 0.939 = 0.892. In other words, using 25 cache-based element test results, the probability of recovering the password from the RockYou dump is close to 90%. Using formula 4, we can also determine the average number of element tests that are needed to eliminate all d incorrect passwords: ℓ = Õ∞ i=1 i · Pr[Z = i] = Õ∞ i=1 i · (Pr[Z ≤ i] − Pr[Z ≤ i − 1]) (5) We tested the above formula by running 100 000 runs of the partitioning algorithm with each time 1 000 random passwords, assuming pe equals 0.3084. Results closely matched the predicted ones. Assuming the RockYou dump is used for the dictionary, and MODP group 22 as the target, formula 5 teaches us that on average an adversary needs to obtain 28.28 element tests to uniquely recover the password. For elliptic curve P-256, the adversary would need on average 25.11 element tests (i.e. quadratic residue test results). 8.3 Computational Requirements To estimate the computational costs of running the partitioning algorithm in function of the dictionary size d, we derive the expected number of element tests that have to be simulated (see line 9 in Listing 3). From formula 5 we already know the expected number of element tests that are needed to eliminate all d incorrect passwords. In other words, the partitioning algorithm will execute on average ℓ iterations. During each of these iterations, a percentage of passwords are eliminated from the dictionary. More precisely, when taking a random element test as reference, and comparing it with an incorrect password, the chance of not being able to remove the incorrect password as a candidate is pf = p 2 e + (1 −pe ) 2 . Hence, the amount of element tests that are performed on average is: d + p 1 f d + p 2 f d + . . . + p − ⌈ℓ⌉ f d = d 1 − p ⌈ℓ⌉ f 1 − pf (6) We again tested the above formula by running 100 000 runs of the partitioning algorithm, with each time 1 000 random passwords, assuming pe = 0.3084. Results closely matched the predictions. For MODP group 22 and the RockYou dictionary, this would mean that on average we have to perform 33 627 714 element tests. With elliptic curve P-256 this results in 28 689 748 element tests on average. On a laptop with an Intel i7-8650U CPU running at 1.90GHz, performing an attack using the RockYou password list takes on average around 11 minutes. This means ordinary users can perform this attack on their existing off-the-shelf hardware. We can further optimize the partitioning algorithm if pe differs from 0.5. That is, when attacking a MODP group, we can first process the dictionary using an element test result where the target did not found the password in the given iteration. Recall that this happens with probability pe . A random incorrect password then has a probability 1 − pe of being eliminated. Since on average ℓ · pe element tests can be used to eliminate an incorrect password with a probability of 1 − pe , formula 6 can be modified in the obvious way to take our new strategy into account. When using the RockYou dictionary under this new strategy, we need to perform only 20 742 225 element tests using MODP group 22, a reduction by 38%. 8.4 Brute-Force Attacks in the Cloud In our second variant of the partitioning attack, we essentially perform an offline brute-force attack on the password. More concretely, our goal is to test all possible 8-character lowercase passwords. Using formula 5 we know that on average this requires 38.36 element tests for MODP group 22, and 38.92 for elliptic curve P-256. Recall that for the MODP case, this means we need to make on average 38.36/1.44 = 26.64 timing measurements. These are relatively modest requirements. For example, in our demonstration of the timing attack we already performed 20 timing measurements. As a result, we must assume an adversary can obtain the required number of element test results.

We now calculate the costs of running the offline partitioning phase on Amazon EC2 instances. For both the MODP and elliptic curve cases, we first performed repeated microbenchmarks where we simulated one million element tests on a single EC2 vCPU. On average, the MODP test took 3.04 microseconds, and the quadratic residue test took 23.25 microseconds. In macrobenchmarks of the partitioning algorithm on the RockYou dictionary, close to identical running times were observed. We now multiply these timings with the result from formula 6. In particular, for MODP group 22 we need to perform on average 301 947 836 620 element tests, and for curve P-256 we need to perform 417 654 129 151 tests. Fortunately, we can parallelize the code by splitting the brute-force search out over several workers. Every worker gets access to all the element test results, meaning if sufficient element tests were obtained, every worker will discard all incorrect passwords. Only the real password will be detected as potentially valid. Assuming we rent c5.18xlarge instances having 72 vCPUs, which costs $3.06 an hour, we can perform the brute-force search against the MODP case for $10.63 on average (within e.g. an hour). The brute-force attack against elliptic curves would cost $125 on average, which although more costly, is still a worryingly low amount. 9 RELATED WORK After the introduction of WPA, it was quickly found to be vulnerable to dictionary attacks. Later, He and Mitchell formally analyzed WPA’s 4-way handshake, and discovered a DoS vulnerability . This resulted in the standardization of a slightly improved 4-way handshake. He et al. continued to analyze the 4-way handshake, and proved its correctness. However, implementations of the 4-way handshake were nevertheless vulnerable to downgrade attacks. Recently, Vanhoef and Piessens discovered that WPA2 was vulnerable to key reinstallation attacks . Finally, Kohlios and Hayajneh provide an overview of WPA2 and the differences with WPA3. Researchers also discovered several DoS attack against Wi-Fi networks. The most well-known is the deauthentication attack . Other DoS attacks exploit weaknesses in TKIP. Additionally, Könings et al. found several DoS vulnerabilities in the physical and MAC layer of 802.11, and other researchers constructed jammers using commodity hardware. A detailed survey of DoS attacks at the physical and MAC layer is given by Bicakci and Tavli. Aiello et al. show how susceptibility to denial-ofservice attacks can be balanced with the need for perfect forward secrecy. To the best of our knowledge, our clogging attack against WPA3 is the first that overloads the CPU of the victim. An initial version of Dragonfly was vulnerable to an offline dictionary attack. A modified variant was then specified in 2008. Several close variants of it have been defined over the years, and are commonly referred to as Dragonfly-type handshakes . Trevor Perrin did a review of an improved draft of the handshake, and later provided an overview of other people’s comments on the handshake . Struik reviewed a draft of the handshake . Clarke and Hao discovered a small subgroup attack against a draft of the handshake, which was mitigated in a new draft. Lancrenon and Skrobot provided a security proof of a close variant of the Dragonfly handshake . Finally, Alharbi et al. designed a variant of Dragonfly that attempts to keep computational costs low . Other types of PAKEs have also been proposed by researchers over the years , of which some have been submitted as RFCs. Finally, there is also research into post-quantum PAKEs. 10 CONCLUSION AND RECOMMENDATIONS In light of our presented attacks, we believe that WPA3 does not meet the standards of a modern security protocol. Moreover, we believe that our attacks could have been avoided if the Wi-Fi Alliance created the WPA3 certification in a more open manner. Notable is also that nearly all of our attacks are against SAE’s password encoding method, i.e., against its hash-to-group and hash-to-curve algorithm. Interestingly, a simple change to this algorithm would have prevented most of our attacks. In particular, the peer’s MAC addresses can be excluded from SAE’s password encoding algorithm, and instead included later on in the handshake itself. This allows the password element to be computed offline, meaning an adversary can no longer actively trigger executions of the password encoding method. Moreover, this would mean that for a given password, the execution time of the password encoding method would always be identical, limiting the amount of information being leaked. Surprisingly, when the CFRG was reviewing a minor variant of Dragonfly, they actually discussed these type of modifications. However, to our surprise, this change was not incorporated into any of the Dragonfly variants. We also conjecture that resource-constrained devices may not implement all the side-channel countermeasures, as these may be too costly on lightweight processors. Additionally, correctly implementing our suggested backwards-compatible side-channel countermeasures is non-trivial. This is worrisome, because security protocols are normally designed to reduce the change of implementation vulnerabilities. Finally, we believe that a more open process would have prevented (or clarified) the possibility of downgrade attacks against WPA3-Transition mode. Nevertheless, although WPA3 has its flaws, we still consider it an improvement over WPA2.