Meltdown (security vulnerability)


Meltdown is a hardware vulnerability affecting Intel x86 microprocessors, IBM POWER processors, and some ARM-based microprocessors. It allows a rogue process to read all memory, even when it is not authorized to do so.

Meltdown affects a wide range of systems. At the time of disclosure, this included all devices running any but the most recent and patched versions of iOS, Linux, macOS, or Windows. Accordingly, many servers and cloud services were impacted, as well as a potential majority of smart devices and embedded devices using ARM based processors (mobile devices, smart TVs, printers and others), including a wide range of networking equipment. A purely software workaround to Meltdown has been assessed as slowing computers between 5 and 30 percent in certain specialized workloads,[8] although companies responsible for software correction of the exploit are reporting minimal impact from general benchmark testing.

Meltdown was issued a Common Vulnerabilities and Exposures ID of CVE-2017-5754, also known as Rogue Data Cache Load (RDCL), in January 2018. It was disclosed in conjunction with another exploit, Spectre, with which it shares some, but not all characteristics. The Meltdown and Spectre vulnerabilities are considered "catastrophic" by security analysts. The vulnerabilities are so severe that, initially, security researchers believed them to be false.

Several procedures to help protect home computers and related devices from the Meltdown and Spectre security vulnerabilities have been published. Meltdown patches may produce performance loss. Spectre patches have been reported to significantly reduce performance, especially on older computers; on the newer eighth-generation Core platforms, benchmark performance drops of 2–14 percent have been measured. On January 18, 2018, unwanted reboots, even for newer Intel chips, due to Meltdown and Spectre patches, were reported.Nonetheless, according to Dell computers: "No 'real-world' exploits of these vulnerabilities [i.e., Meltdown and Spectre] have been reported to date [January 26, 2018], though researchers have produced proof-of-concepts." Further, recommended preventions include: "promptly adopting software updates, avoiding unrecognized hyperlinks and websites, not downloading files or applications from unknown sources ... following secure password protocols ... [using] security software to help protect against malware (advanced threat prevention software or anti-virus)."

On January 25, 2018, the current status and possible future considerations in solving the Meltdown and Spectre vulnerabilities were presented.

On March 15, 2018, Intel reported that it will redesign its CPU processors (performance losses to be determined) to help protect against the Meltdown and related Spectre vulnerabilities (especially, Meltdown and Spectre-V2, but not Spectre-V1), and expects to release the newly redesigned processors later in 2018.
Overview
Meltdown exploits a race condition, inherent in the design of many modern CPUs. This occurs between memory access and privilege checking during instruction processing. Additionally, combined with a cache side-channel attack, this vulnerability allows a process to bypass the normal privilege checks that isolate the exploit process from accessing data belonging to the operating system and other running processes. The vulnerability allows an unauthorized process to read data from any address that is mapped to the current process's memory space. Since instruction pipelining is in the affected processors, the data from an unauthorized address will almost always be temporarily loaded into the CPU's cache during out-of-order execution —from which the data can be recovered. This can occur even if the original read instruction fails due to privilege checking, and/or if it never produces a readable result.

Since many operating systems map physical memory, kernel processes, and other running user space processes into the address space of every process, Meltdown effectively makes it possible for a rogue process to read any physical, kernel or other processes' mapped memory—regardless of whether it should be able to do so. Defenses against Meltdown would require avoiding the use of memory mapping in a manner vulnerable to such exploits (i.e. a software-based solution) or avoidance of the underlying race condition (i.e. a modification to the CPUs' microcode and/or execution path).

The vulnerability is viable on any operating system in which privileged data is mapped into virtual memory for unprivileged processes—which includes many present-day operating systems. Meltdown could potentially impact a wider range of computers than presently identified, as there is little to no variation in the microprocessor families used by these computers.

A Meltdown attack cannot be detected if it is carried out.

History

This section is in a list format that may be better presented using prose. You can help by converting this section to prose, if appropriate. Editing help is available. (January 2018)
On May 8, 1995, a paper called "The Intel 80x86 Processor Architecture: Pitfalls for Secure Systems" published at the 1995 IEEE Symposium on Security and Privacy warned against a covert timing channel in the CPU cache and translation lookaside buffer (TLB). This analysis was performed under the auspices of the National Security Agency's Trusted Products Evaluation Program (TPEP).

In July 2012, Apple's XNU kernel (used in macOS, iOS and tvOS, among others) adopted kernel address space layout randomization (KASLR) with the release of OS X Mountain Lion 10.8. In essence, the base of the system, including its kernel extensions (kexts) and memory zones, is randomly relocated during the boot process in an effort to reduce the operating system's vulnerability to attacks.In March 2014, the Linux kernel adopted KASLR to mitigate address leaks.

On August 8, 2016, Anders Fogh and Daniel Gruss presented "Using Undocumented CPU Behavior to See Into Kernel Mode and Break KASLR in the Process" at the Black Hat 2016 conference.On August 10, 2016, Moritz Lipp et al. of TU Graz published "ARMageddon: Cache Attacks on Mobile Devices" in the proceedings of the 25th USENIX security symposium. Even though focused on ARM, it laid the groundwork for the attack vector.

On December 27, 2016, at 33C3, Clémentine Maurice and Moritz Lipp of TU Graz presented their talk "What could possibly go wrong with <insert x86 instruction here>? Side effects include side-channel attacks and bypassing kernel ASLR" which outlined already what is coming.

On February 1, 2017, the CVE numbers 2017-5715, 2017-5753 and 2017-5754 were assigned to Intel.

On February 27, 2017, Bosman et al. of Vrije Universiteit Amsterdam published their findings how address space layout randomization (ASLR) could be abused on cache-based architectures at the NDSS Symposium.

On March 27, 2017, researchers at Austria's Graz University of Technology developed a proof-of-concept that could grab RSA keys from Intel SGX enclaves running on the same system within five minutes by using certain CPU instructions in lieu of a fine-grained timer to exploit cache DRAM side-channels.

In June 2017, KASLR was found to have a large class of new vulnerabilities.Research at Graz University of Technology showed how to solve these vulnerabilities by preventing all access to unauthorized pages.A presentation on the resulting KAISER technique was submitted for the Black Hat congress in July 2017, but was rejected by the organizers. Nevertheless, this work led to kernel page-table isolation (KPTI, originally known as KAISER) in 2017, which was confirmed to eliminate a large class of security bugs, including the not-yet-discovered Meltdown – a fact confirmed by the Meltdown authors.

In July 2017, research made public on the CyberWTF website by security researcher Anders Fogh outlined the use of a cache timing attack to read kernel space data by observing the results of speculative operations conditioned on data fetched with invalid privileges.

Meltdown was discovered independently by Jann Horn from Google's Project Zero, Werner Haas and Thomas Prescher from Cyberus Technology, as well as Daniel Gruss, Moritz Lipp, Stefan Mangard and Michael Schwarz from Graz University of Technology.The same research teams that discovered Meltdown also discovered a related CPU security vulnerability now called Spectre.

In October 2017, Kernel ASLR support on amd64 was added to NetBSD-current, making NetBSD the first totally open-source BSD system to support kernel address space layout randomization (KASLR). However, the partially open-source Apple Darwin, which forms the foundation of macOS and iOS (among others), is based on FreeBSD; KASLR was added to its XNU kernel in 2012 as noted above.

On November 14, 2017, security researcher Alex Ionescu publicly mentioned changes in the new version of Windows 10 that would cause some speed degradation without explaining the necessity for the changes, just referring to similar changes in Linux.

After affected hardware and software vendors had been made aware of the issue on July 28, 2017, the two vulnerabilities were made public jointly, on January 3, 2018, several days ahead of the coordinated release date of January 9, 2018 as news sites started reporting about commits to the Linux kernel and mails to its mailing list. As a result, patches were not available for some platforms, such as Ubuntu,when the vulnerabilities were disclosed.

On January 28, 2018, Intel was reported to have shared news of the Meltdown and Spectre security vulnerabilities with Chinese technology companies before notifying the U.S. government of the flaws.The security vulnerability was called Meltdown because "the vulnerability basically melts security boundaries which are normally enforced by the hardware."

Affected hardware
The Meltdown vulnerability primarily affects Intel microprocessors but some ARM microprocessors are also affected. The vulnerability does not affect AMD microprocessors. Intel has countered that the flaws affect all processors, but AMD has denied this, saying "we believe AMD processors are not susceptible due to our use of privilege level protections within paging architecture".

Researchers have indicated that the Meltdown vulnerability is exclusive to Intel processors, while the Spectre vulnerability can possibly affect some Intel, AMD, and ARM processors.However, ARM announced that some of their processors were vulnerable to Meltdown. Google has reported that any Intel processor since 1995 with out-of-order execution is potentially vulnerable to the Meltdown vulnerability (this excludes Itanium and pre-2013 Intel Atom CPUs). Intel introduced speculative execution to their processors with Intel's P6 family microarchitecture with the Pentium Pro IA-32 microprocessor in 1995.

ARM has reported that the majority of their processors are not vulnerable, and published a list of the specific processors that are affected. The ARM Cortex-A75 core is affected directly by both Meltdown and Spectre vulnerabilities, and Cortex-R7, Cortex-R8, Cortex-A8, Cortex-A9, Cortex-A15, Cortex-A17, Cortex-A57, Cortex-A72 and Cortex-A73 cores are affected only by the Spectre vulnerability. This contradicts some early statements made about the Meltdown vulnerability as being Intel-only.

A large portion of the current mid-range Android handsets use the Cortex-A53 or Cortex-A55 in an octa-core arrangement and are not affected by either the Meltdown or Spectre vulnerability as they do not perform out-of-order execution. This includes devices with the Qualcomm Snapdragon 630, Snapdragon 626, Snapdragon 625, and all Snapdragon 4xx processors based on A53 or A55 cores. Also, no Raspberry Pi computers are vulnerable to either Meltdown or Spectre.

IBM has also confirmed that its Power CPUs are affected by both CPU attacks.Red Hat has publicly announced in its January 3 advisory that the exploits are also for IBM System Z, Power Architecture, POWER8, and POWER9 systems.

Oracle has stated that V9 based SPARC systems (T5, M5, M6, S7, M7, M8, M10, M12 processors) are not affected by Meltdown, though older SPARC processors that are no longer supported may be impacted.

Mechanism
Meltdown relies on a CPU race condition that can arise between instruction execution and privilege checking. Put briefly, the instruction execution leaves side effects that constitute information not hidden to the process by the privilege check. The process carrying out Meltdown then uses these side effects to infer the values of memory mapped data, bypassing the privilege check. The following provides an overview of the exploit, and the memory mapping that is its target. The attack is described in terms of an Intel processor running Microsoft Windows or Linux, the main test targets used in the original paper, but it also affects other processors and operating systems.

Background – modern CPU design
Modern computer processors use a variety of techniques to gain high levels of efficiency. Four widely used features are particularly relevant to Meltdown:

Virtual (paged) memory, also known as memory mapping – used to make memory access more efficient and to control which processes can access which areas of memory.
A modern computer usually runs many processes in parallel. In an operating system such as Windows or Linux, each process is given the impression that it alone has complete use of the computer's physical memory, and may do with it as it likes. In reality it will be allocated memory to use from the physical memory, which acts as a "pool" of available memory, when it first tries to use any given memory address (by trying to read or write to it). This allows multiple processes, including the kernel or operating system itself, to co-habit on the same system, but retain their individual activity and integrity without being affected by other running processes, and without being vulnerable to interference or unauthorized data leaks caused by a rogue process.
Privilege levels, or protection domains – provide a means by which the operating system can control which processes are authorized to read which areas of virtual memory.
As virtual memory permits a computer to refer to vastly more memory than it will ever physically contain, the system can be greatly sped up by "mapping" every process and their in-use memory – in effect all memory of all active processes – into every process's virtual memory. In some systems all physical memory is mapped as well, for further speed and efficiency. This is usually considered safe, because the operating system can rely on privilege controls built into the processor itself, to limit which areas of memory any given process is permitted to access. An attempt to access authorized memory will immediately succeed, and an attempt to access unauthorized memory will cause an exception and void the read instruction, which will fail. Either the calling process or the operating system directs what will happen if an attempt is made to read from unauthorized memory – typically it causes an error condition and the process that attempted to execute the read will be terminated. As unauthorized reads are usually not part of normal program execution, it is much faster to use this approach than to pause the process every time it executes some function that requires privileged memory to be accessed, to allow that memory to be mapped into a readable address space.
Instruction pipelining and speculative execution – used to allow instructions to execute in the most efficient manner possible – if necessary allowing them to run out of order or in parallel across various processing units within the CPU – so long as the final outcome is correct.
Modern processors commonly contain numerous separate execution units, and a scheduler that decodes instructions and decides, at the time they are executed, the most efficient way to execute them. This might involve the decision that two instructions can execute at the same time, or even out of order, on different execution units (known as "instruction pipelining"). So long as the correct outcome is still achieved, this maximizes efficiency by keeping all of the processor's execution units in use as much as possible. Some instructions, such as conditional branches, will lead to one of two different outcomes, depending on a condition. For example, if a value is 0, it will take one action, and otherwise will take a different action. In some cases, the CPU may not yet know which branch to take. This may be because a value is uncached. Rather than wait to learn the correct option, the CPU may proceed immediately (speculative execution). If so, it can either guess the correct option (predictive execution) or even take both (eager execution). If it executes the incorrect option, the CPU will attempt to discard all effects of its incorrect guess. (See also: branch predictor)
CPU cache – a modest amount of memory within the CPU used to ensure it can work at high speed, to speed up memory access, and to facilitate "intelligent" execution of instructions in an efficient manner.
From the perspective of a CPU, the computer's physical memory is slow to access. Also the instructions a CPU runs are very often repetitive, or access the same or similar memory numerous times. To maximize efficient use of the CPU's resources, modern CPUs often have a modest amount of very fast on-chip memory, known as "CPU cache". When data is accessed or an instruction is read from physical memory, a copy of that information is routinely saved in the CPU cache at the same time. If the CPU later needs the same instruction or memory contents again, it can obtain it with minimal delay from its own cache rather than waiting for a request related to physical memory to take place.
Meltdown exploit
Ordinarily, the mechanisms described above are considered secure. They provide the basis for most modern operating systems and processors. Meltdown exploits the way these features interact, to bypass the CPU's fundamental privilege controls and access privileged and sensitive data from the operating system and other processes. To understand Meltdown, we consider the data that is mapped in virtual memory (much of which the process is not supposed to be able to access), and look at how the CPU responds when a process attempts to access unauthorized memory. The process is running on a vulnerable version of Windows, Linux, or MacOS, on a 64 bit processor of a vulnerable type.(This is a very common combination across almost all desktop computers, notebooks, laptops, servers and mobile devices.)

The CPU attempts to execute an instruction referencing a memory operand. The addressing mode requires the operand's address, Base+A, to be calculated using the value at an address, A, forbidden to the process by the virtual memory system and privilege check. The instruction is scheduled and dispatched to an execution unit. This execution unit then schedules both the privilege check and the memory access.
The privilege check informs the execution unit that the address, A, involved in the access is forbidden to the process (per the information stored by the virtual memory system), and thus the instruction should fail. The execution unit must then discard the effects of the memory read. One of those effects, however, can be caching of the data at Base+A, which may have been completed as a side effect of the memory access before the privilege check – and may not have been undone by the execution unit (or any other part of the CPU). If this is indeed the case, the mere act of caching constitutes a leak of information in and of itself. At this point, Meltdown intervenes.
The process executes a timing attack by executing instructions referencing memory operands directly. To be effective, the operands of these instructions must be at addresses which cover the possible address, Base+A, of the rejected instruction's operand. Because the data at the address referred to by the rejected instruction, Base+A, was cached nevertheless, an instruction referencing the same address directly will execute faster. The process can detect this timing difference and determine the address, Base+A, that was calculated for the rejected instruction – and thus determine the value at the forbidden memory address A.
Meltdown uses this technique in sequence to read every address of interest at high speed, and depending on other running processes, the result may contain passwords, encryption data, and any other sensitive information, from any address of any process that exists in its memory map. In practice because cache side-channel attacks are slow, it's faster to extract data one bit at a time (only 2 × 8 = 16 cache attacks needed to read a byte, rather than 256 steps if it tried to read all 8 bits at once).

Impact
The impact of Meltdown depends on the design of the CPU, the design of the operating system (specifically how it uses memory paging), and the ability of a malicious party to get any code run on that system, as well as the value of any data it could read if able to execute.

CPU – Many of the most widely used modern CPUs from the late 1990s until early 2018 have the required exploitable design. However, it is possible to mitigate it within CPU design. A CPU that could detect and avoid memory access for unprivileged instructions, or was not susceptible to cache timing attacks or similar probes, or removed cache entries upon non-privilege detection (and did not allow other processes to access them until authorized) as part of abandoning the instruction, would not be able to be exploited in this manner. Some observers consider that all software solutions will be "workarounds" and the only true solution is to update affected CPU designs and remove the underlying weakness.
Operating system – Most of the widely used and general-purpose operating systems use privilege levels and virtual memory mapping as part of their design. Meltdown can access only those pages that are memory mapped so the impact will be greatest if all active memory and processes are memory mapped in every process and have the least impact if the operating system is designed so that almost nothing can be reached in this manner. An operating system might also be able to mitigate in software to an extent by ensuring that probe attempts of this kind will not reveal anything useful. Modern operating systems use memory mapping to increase speed so this could lead to performance loss.
Virtual machine – Meltdown attack cannot be used to break out of a virtual machine, i.e., in fully virtualized machines guest user space can still read from guest kernel space, but not from host kernel space. The bug enables reading memory from address space represented by the same page table, meaning the bug does not work between virtual tables. That is, Guest-to-Host page tables are unaffected, only Guest-to-same-Guest or Host-to-Host, and of course Host-to-Guest since the host can already access the guest pages. This means different VMs on the same fully virtualized hypervisor cannot access each other's data, but different users on the same guest instance can access each other's data.
Embedded device – Among the vulnerable chips are those made by ARM and Intel designed for standalone and embedded devices, such as mobile phones, smart TVs, networking equipment, vehicles, hard drives, industrial control, and the like. As with all vulnerabilities, if a third party cannot run code on the device, its internal vulnerabilities remain unexploitable. For example, an ARM processor in a cellphone or Internet of Things "smart" device may be vulnerable, but the same processor used in a device that cannot download and run new code, such as a kitchen appliance or hard drive controller, is believed to not be exploitable.Impact itself depends on the implementation of the address translation mechanism in the OS and the underlying hardware architecture. The attack can reveal the content of any memory that is mapped into a user address space, even if otherwise protected. For example, before kernel page-table isolation is introduced, most versions of Linux map all physical memory into the address space of every user-space process; the mapped addresses are (mostly) protected, making them unreadable from user-space and accessible only when transitioned into the kernel. The existence of these mappings makes transitioning to/from the kernel faster, but is unsafe in the presence of this Meltdown vulnerability, as the contents of all physical memory (which may contain sensitive information such as passwords belonging to other processes or the kernel) can then be obtained via the above method by any unprivileged process from user-space.

According to researchers, "every Intel processor that implements out-of-order execution is potentially affected, which is effectively every processor since 1995 (except Intel Itanium and Intel Atom before 2013) Intel responded to the reported security vulnerabilities with an official statement.

The vulnerability is expected to impact major cloud providers, such as Amazon Web Services (AWS) and Google Cloud Platform. Cloud providers allow users to execute programs on the same physical servers where sensitive data might be stored, and rely on safeguards provided by the CPU to prevent unauthorized access to the privileged memory locations where that data is stored, a feature that the Meltdown exploit circumvents.

One of the paper's authors reports that paravirtualization (Xen) and containers such as Docker, LXC, and OpenVZ, are affected. They report that the attack on a fully virtualized machine allows the guest user space to read from the guest kernel memory, but not read from the host kernel space.

Mitigation
Further information: Kernel page-table isolation
Mitigation of this vulnerability requires changes to operating system kernel code, including increased isolation of kernel memory from user-mode processes. Linux kernel developers have referred to this measure as kernel page-table isolation (KPTI). KPTI patches have been developed for Linux kernel 4.15, and have been released as a backport in kernels 4.14.11, 4.9.75. Red Hat released kernel updates to their Red Hat Enterprise Linux distributions version 6[80] and version 7. CentOS also already released their kernel updates to CentOS 6 and CentOS 7.

Apple included mitigations in macOS 10.13.2, iOS 11.2, and tvOS 11.2. These were released a month before the vulnerabilities were made public. Apple has stated that watchOS and the Apple Watch are not affected.[88] Additional mitigations were included in a Safari update as well a supplemental update to macOS 10.13, and iOS 11.2.2.

Microsoft released an emergency update to Windows 10, 8.1, and 7 SP1 to address the vulnerability on January 3, 2018, as well as Windows Server (including Server 2008 R2, Server 2012 R2, and Server 2016) and Windows Embedded Industry. These patches are incompatible with third-party antivirus software that use unsupported kernel calls; systems running incompatible antivirus software will not receive this or any future Windows security updates until it is patched, and the software adds a special registry key affirming its compatibility. The update was found to have caused issues on systems running certain AMD CPUs, with some users reporting that their Windows installations did not boot at all after installation. On January 9, 2018, Microsoft paused the distribution of the update to systems with affected CPUs while it investigates and addresses this bug.

It was reported that implementation of KPTI may lead to a reduction in CPU performance, with some researchers claiming up to 30% loss in performance, depending on usage, though Intel considered this to be an exaggeration. It was reported that Intel processor generations that support process-context identifiers (PCID), a feature introduced with Westmerem and available on all chips from the Haswell architecture onward, were not as susceptible to performance losses under KPTI as older generations that lack it. This is because the selective translation lookaside buffer (TLB) flushing enabled by PCID (also called address space number or ASN under the Alpha architecture) enables the shared TLB behavior crucial to the exploit to be isolated across processes, without constantly flushing the entire cache – the primary reason for the cost of mitigation.

A statement by Intel said that "any performance impacts are workload-dependent, and, for the average computer user, should not be significant and will be mitigated over time".Phoronix benchmarked several popular PC games on a Linux system with Intel's Coffee Lake Core i7-8700K CPU and KPTI patches installed, and found that any performance impact was little to non-existent.In other tests, including synthetic I/O benchmarks and databases such as PostgreSQL and Redis, a measurable impact in performance was found.

Several procedures to help protect home computers and related devices from the Meltdown and Spectre security vulnerabilities have been published.Meltdown patches may produce performance loss. On January 18, 2018, unwanted reboots, even for newer Intel chips, due to Meltdown and Spectre patches, were reported. According to DELL computers: "No 'real-world' exploits of these vulnerabilities [ie, Meltdown and Spectre] have been reported to date [January 26, 2018], though researchers have produced proof-of-concepts." Further, recommended preventions include: "promptly adopting software updates, avoiding unrecognized hyperlinks and websites, not downloading files or applications from unknown sources ... following secure password protocols ... [using] security software to help protect against malware (advanced threat prevention software or anti-virus).

On January 25, 2018, the current status and possible future considerations in solving the Meltdown and Spectre vulnerabilities were presented.In March 2018, Intel announced that they have designed hardware fixes for future processors for Meltdown and Spectre-V2 only, but not Spectre-V1. The vulnerabilities were mitigated by a new partitioning system that improves process and privilege-level separation. Intel has developed microcode workarounds for processors dating back to 2013, and as of March 2018 has plans to develop them for processors dating back to 2007.

Summary of mitigations on Microsoft Windows
Vulnerability CVE Exploit name Public vulnerability name Windows changes Firmware changes
(Spectre) 2017-5753 Variant 1 Bounds Check Bypass (BCB) Recompiling with a new compiler
Hardened Browser to prevent exploit from JavaScript No
(Spectre) 2017-5715 Variant 2 Branch Target Injection (BTI) New CPU instructions eliminating branch speculation Yes
Meltdown 2017-5754 Variant 3 Rogue Data Cache Load (RDCL) Isolate kernel and user mode page tables No