CACHE-BASED ATTACKS ON ECC GROUPS

In this section we demonstrate that implementations of the hashto-curve algorithm of SAE may be vulnerable to cache-based sidechannel attacks. Similar to the timing attack against MODP groups, this will later on enable an adversary to recover a target’s password. 7.1 Background and Attack Goal The goal of our attack is to learn if the Quadratic Residue (QR) test in the first iteration of the hash-to-curve algorithm succeeded or not. This information will be used in the offline password partitioning attack of Section 8 to recover the target’s password. Unlike the case of the MODP groups described in Section 6, the implementation of the hash-to-curve algorithm for ECC groups does include mitigations against side-channel attacks. Those mitigations include performing extra dummy iterations on random data, and blinding of the underlying cryptographic calculation of the quadratic residue test . The resulting code of wpa_supplicant and hostapd implementation we reviewed is pseudo-constant time, i.e., there might be some minor variation in run time, but they are too minute to be measured by an adversary. However, in many cases such pseudo-constant time implementations are still vulnerable to different types micro-architectural side-channel attacks. 7.1.1 Micro-Architectural Side-Channel Attacks. Modern processors try to optimize their behavior (e.g. memory access, branch prediction) by saving an internal state that depends on the past. Micro-architectural side-channel attacks exploit leaked information about the running of other programs due to sharing of this state . Cache-based side-channel attacks exploit the state of the memory case (either instructions or data) and have been widely used to break cryptographic primitives . In the different variants of the Flush+Reload attack  the attacker starts by evicting (or flushing) a memory location from the cache. After waiting for a predetermined interval, he measures the time it takes to reload the flushed location. If during the interval the victim accesses this memory location, it will be cached, and the reload time for the attacker will be short. Otherwise, the reloading of the flushed memory location will be much slower. In this way, the attacker can trace the victim’s memory access patterns. 7.2 Attack Scenario Our attack requires the ability to monitor cache access patterns on the victim machine. However, unlike many cache attacks against TLS implementations , we can also target the client side. We can run our attack from any unprivileged user-mode process (or application on android). Oren et al. even showed how to perform such attacks from the browser using JavaScript code (although browsers are now implementing mitigation for these types of attacks). For our password partitioning attack, we need to record several handshakes with different MAC addresses. We can get handshakes with different MAC addresses by targeting multiple clients in the same network (e.g. convince multiple users to download the same malicious application). If we are only able to attack one client, we can set up rogue APs with the same SSID but a spoofed MAC address. We can force victims into connecting to our rogue AP by using a higher signal strength, or jamming the legitimate AP. 7.3 Attacking the hostap Implementation Our target implementation is the sae_derive_pwe_ecc function in the latest hostap code (commit 0eb34f8f2 from Sat Jan 26) with the default curve P-256. Our test machine uses a 4-core Intel Core i7-7500 processor, with a 4 MiB cache and 16 GiB memory, running Ubuntu 18.04.1. To monitor access to the instruction cache we use the Flush+Reload attack, as implemented in the Mastik toolkit. To learn the result of the first QR test, we can either attack the blinded QR test implementation, or the branch in the iteration loop that checks the result of the test. A simple cache attack against the blinded QR test is infeasible as the two possible code paths (see Listing 4 line 19) are compiled into a single cache line.1 The two code paths of the branch inside the iteration loop (see Listing 5 line 28) are compiled into two separate cache lines. Therefore we can monitor cache access to nQR cache line which is the target of the conditional jump (see Listing 6 line 9). To differentiate between the branches taken in the first and subsequent iterations, we created a synchronization “clock” by monitoring another cache line that is accessed once every iteration (similarly to what is done in ). On our test platform, monitoring two cache lines repeatedly over time caused a high rate of false positives (i.e. false detection of access to cache lines). This error rate increases considerably if the monitored cache lines are close. Consequently, for our “clock” we choose to monitor a cache line far away from the nQR cache line (in our case the function sha256_prf_bits). 7.3.1 Classification of Cache Access Patterns. We want to classify our cache traces as one of two cases depending on the results of the QR test in the first iteration (non-QR or QR). The measured cache access patterns to the two monitored cache lines show a high variance between different traces of the same case. This might be due to OS related noise, speculative execution, or the way that the random dummy iterations affect the branch predictor behavior in the next run of the function. To overcome this we perform a simplified variant of a cache template attack. That is, we measure a trace of the cache access pattern by monitoring the two addresses (the “clock” and the non-QR case) in fixed intervals of 50000 clock cycles (each iteration takes ≈ 200000 clock cycles on our test machine). We encode each trace into two bits that correspond to the two memory locations. In each trace a bit is set to one if its corresponding memory location was accessed, and to zero otherwise. In our attack we only keep active measurements with at least one non-zero bit. Our attack returns the value of the first two active measurements, meaning the return value consists of four bits (resulting in 9 possible return values). Figure 7 shows the distribution of these return values when the first iteration of the hash-to-curve algorithm results in a non-QR number (nQR), and when the first iteration results in a QR number (QR). For our classification we repeat the attack 20 times for each MAC address. We created a training set for the non-QR and the QR cases using 100 · 20 traces each. We used this training set to build a simple linear classifier that receives 20 traces as an input, and returns the input classification, namely either non-QR or QR. We then tested our attack and linear classifier on a larger test set of 400 · 20 traces for each case. For the non-QR case we have achieved a 100% success rate (400 out of 400), and for the QR case we have achieved a 99.5% success rate (398 out of 400). We can conclude that an adversary can reliably abuse cache-based side-channels to determine whether the password element was found in the first iteration or not. 7.4 Countermeasures and Discussion As in the MODP case, the ideal solution is to modify the SAE handshake such that the password element is independent of MAC addresses, and use a constant-time hash-to-curve algorithm from the new standard. Even if the attacker can attack the one-time offline calculation, and exploit some residual side-channel leakage, the expected number of password bits leaked is only two. A backwards-compatible countermeasure is to replace the two vulnerable branches with a constant-time select utility, and use constant time Legendre symbol computation as defined in.