TLBLEED
Overview
TLBleed is a new side channel attack that has been proven to work on Intel CPU’s with Hyperthreading (generally Simultaneous Multi-threading, or SMT, or HT on Intel) enabled. It relies on concurrent access to the TLB, and it being shared between threads. We find that the L1dtlb and the STLB (L2 TLB) is shared between threads on Intel CPU cores.
Result
This means that co-resident hyperthreads can get a certain amount of information on the data memory accesses of the other HT, without needing any shared cache. Whereas the cache can be partitioned to protect against cache attacks (such as in Cloak using TSX, or using CAT, or using page coloring), this is practically impossible for the TLB, and no such systems have been proposed. Thus in the presence of cache defenses, TLBleed remains a risk in this threat model.
Requirements (a.k.a. threat model)
The threat model is identical to Colin Pervical’s seminal 2005 work, Cache Missing for Fun and Profit, which arguably introduced practical cache side channels. Different from TLBleed, it requires concurrent access to the cache shared between Hyperthreads. TLBleed gets all information through the shared TLB, which can’t be partitioned between processes in software (either application or OS).
Impact & Coverage
As a result of seeing this work, OpenBSD decided to disable Hyperthreading by default. This has prompted some speculation that TLBleed is a spectre-like attack, but that is not the case. OpenBSD also realizes the exact impact of TLBleed. Red Hat has a very thoughtful piece here. There has been significant news coverage: TheRegister (and this one), ArsTechnica, ZDnet, Techrepublic, TechTarget, ITwire, tweakers, and a personal favorite, the SecurityNow Podcast episode 669 (mp3, show notes, youtube).
Overview of results
We briefly tour the results of our paper with some technical details that a technical audience might be interested in. Full technical details can be found in the paper, linked below.
libgcrypt ECC point multiplication
To demonstrate the effectiveness of TLBleed, we side-channel the non-side-channel resistant version of the libgcrypt EdDSA ECC scalar-point multiplication function. In later versions it has been hardened against this attack. We demonstrate that even if cache protections are in place, when this code would be safe, it would still be vulnerable from TLB leakage. The chart below shows the progression of analysis. The spy process acquires the TLB usage signal while the other hyperthread is executing the point-scalar multiplication using a secret key (this would happen e.g. when a signing operation takes place). The raw TLB signal looks noisy, but we are able to apply a machine learning technique to distinguish when a 0 or a 1 secret key bit is being processed.
The latency is the raw TLB usage signal as the spy process acquires it. The shaded regions show ground truth corresponding to which secret key bit the other HT is processing. The spy process wants to decide which of the two (dup or mul, corresponding to a certain secret key bit) it is. The moving average shows the signal contains a distinguishing characteristic. The SVM classifier output shows that a SVM classifier can reliably learn to tell the difference between the two.
After we capture the signal, to reconstruct the key sometimes we have to guess some values (sometimes the SVM classifier output makes no sense, sometimes it’s wrong in 1 or 2 bits), so we apply some heuristics and brute force until we guess the right key. We try this on 3 different microarchitectures and reach the following success rates, while analyzing the data from just a single TLB capture. We call it a ‘success’ if we can guess the key after a limited amount of brute force effort (see below).
We try our attack on 3 different microarchitectures and report on the reliability results. The success rate is from 0 to 1, i.e. 0.998 means 99.8% success rate.
This is the distribution of brute force effort needed for the 3 different cases.
Brute force effort needed for 3 different microarchitectures where we eventually were successful in guessing the right key using our heuristics and brute force algorithm.
RSA
We also apply our technique to an older implementation of RSA, using square-and-multiply, in libgcrypt, which was written to be side-channel-resistant. (Updates since that time have improved its side channel resistance.) Its side channel resistance stems from having a near-constant control flow. TLBleed however relies on the data access pattern and can see the difference between the multiply result being used (1-bit in exponent) or not (0-bit in exponent). The error rate for a 1024-bit key was much higher than for a 256-bit EdDSA key however, and we could not brute force our way to a full reliable key recovery. We do believe that with some advanced number theoretical techniques (see paper and previous work, e.g. Cachebleed), this number of unknown bits out of 1024 can lead to a reliable key recovery with just a single capture.
Error rate of 1024-bit RSA key recovery after a single capture of TLBleed data from the RSA implementation.
Also: defense bypass, covert channel
The paper contains more material: we implement a covert channel over the TLB, and demonstrate bypasses of cache protections. See the paper for full details.
Presentations & Full Paper
A copy of the paper, published at and to be presented at Usenix Security this year, as well as at a Blackhat Briefing, is online now: paper is here.