May 16, 2019
Joel Hruska

What Is Speculative Execution?

As discussion of the Spectre and Meltdown flaws continues to dominate the tech news cycle, there has been repeated reference to a specific feature of high-end CPUs: speculative execution. It’s a key capability of higher-end ARM products, Apple’s custom ARM cores, IBM’s POWER family, and the vast majority of the x86 processors produced by Intel and AMD. Here’s what speculative execution is and how it relates to other key capabilities of modern microprocessors, and how the recent Meltdown bug targets Intel CPUs in particular.

What Is Speculative Execution?

Speculative execution is a technique CPU designers use to improve CPU performance. It’s one of three components of out-of-order execution, also known as dynamic execution. Along with multiple branch prediction (used to predict the instructions most likely to be needed in the near future) and dataflow analysis (used to align instructions for optimal execution, as opposed to executing them in the order they came in), speculative execution delivered a dramatic performance improvement over previous Intel processors. Because these techniques worked so well, they were quickly adopted by AMD, which used out-of-order processing beginning with the K5. ARM’s focus on low-power mobile processors initially kept it out of the OOoE playing field, but the company adopted out-of-order execution when it built the Cortex A9 and has continued to expand its use of the technique with later, more powerful Cortex-branded CPUs.

Here’s how it works. Modern CPUs are all pipelined, which means they’re capable of executing multiple instructions in parallel, as shown in the diagram below.

Pipeline-Wikipedia

Image by Wikipedia. This is a general diagram of a pipelined CPU, showing how instructions move through the processor from clock cycle to clock cycle.

Imagine that the green block represents an if-then-else branch. The branch predictor calculates which branch is more likely to be taken, fetches the next set of instructions associated with that branch, and begins speculatively executing them before it knows which of the two code branches it’ll be using. In the diagram above, these speculative instructions are represented as the purple box. If the branch predictor guessed correctly, then the next set of instructions the CPU needed are lined up and ready to go, with no pipeline stall or execution delay.

Without branch prediction and speculative execution, the CPU doesn’t know which branch it will take until the first instruction in the pipeline (the green box) finishes executing and moves to Stage 4. Instead of having moving straight from one set of instructions to the next, the CPU has to wait for the appropriate instructions to arrive. This hurts system performance since it’s time the CPU could be performing useful work.

The reason its “speculative” execution, of course, is because the CPU might be wrong. If it is, the system loads the appropriate data and executes those instructions instead. But branch predictors aren’t wrong very often; accuracy rates are typically above 95 percent.

Why Use Speculative Execution?

Decades ago, before out-of-order execution was invented, CPUs were what we today call “in order” designs. Instructions executed in the order they were received, with no attempt to reorder them or execute them more efficiently. One of the major problems with in-order execution is that a pipeline stall stops the entire CPU until the issue is resolved.

The other problem that drove the development of speculative execution was the gap between CPU and main memory speeds. The graph below shows the gap between CPU and memory clocks. As the gap grew, the amount of time the CPU spent waiting on main memory to deliver information grew as well. Features like L1, L2, and L3 caches and speculative execution were designed to keep the CPU busy and minimize the time it spent idling.

mem_gap

If memory could match the performance of the CPU there would be no need for caches.

It worked. The combination of large off-die caches and out-of-order execution gave Intel’s Pentium Pro and Pentium II opportunities to stretch their legs in ways previous chips couldn’t match. This graph from a 1997 Anandtech article shows the advantage clearly.

cpuben6

Thanks to the combination of speculative execution and large caches, the Pentium II 166 decisively outperforms a Pentium 250 MMX, despite the fact that the latter has a 1.51x clock speed advantage over the former.

Ultimately, it was the Pentium II that delivered the benefits of out-of-order execution to most consumers. The Pentium II was a fast microprocessor relative to the Pentium systems that had been top-end just a short while before. AMD was an absolutely capable second-tier option, but until the original Athlon launched, Intel had a lock on the absolute performance crown.

The Pentium Pro and the later Pentium II were far faster than the earlier architectures Intel used. This wasn’t guaranteed. When Intel designed the Pentium Pro it spent a significant amount of its die and power budget enabling out of order execution. But the bet paid off, big time.

There are differences between how Intel, AMD, and ARM implement speculative execution, and those differences are part of why Intel is exposed to some of these attacks in ways that the other vendors aren’t. But speculative execution, as a technique, is simply far too valuable to stop using. Every single high-end CPU architecture today — AMD, ARM, IBM, Intel, SPARC — uses out-of-order execution. And speculative execution, while implemented differently from company to company, is used by each of them. Without speculative execution, out-of-order execution as we know it wouldn’t function.

Why Is Meltdown Such a Problem for Intel?

The reason Meltdown causes such unique headaches for Intel is because Intel allows speculative execution to access privileged memory a user-space application would never be allowed to touch. Here’s how MarkCC of Goodmath.org describes the problem:

Code that’s running under speculative execution doesn’t do the check whether or not memory accesses from cache are accessing privileged memory. It starts running the instructions without the privilege check, and when it’s time to commit to whether or not the speculative execution should be continued, the check will occur. But during that window, you’ve got the opportunity to run a batch of instructions against the cache without privilege checks. So you can write code with the right sequence of branch instructions to get branch prediction to work the way you want it to; and then you can use that to read memory that you shouldn’t be able to read.

The speculative prediction implementations of other CPU vendors don’t allow user-space applications to probe the contents of kernel space memory at any point. The only way to mitigate Meltdown in software is to force the system to perform a full context switch every time it switches between kernel and user memory space. The reason the performance impact from Meltdown is so varied is that how much this patch hurts is a function of how often an application has to context switch. The performance issues, however, appear to be limited to servers and have not generally been seen on the consumer side — at least, not very much.

There are Performance Impacts on Some Mitigation Strategies

One of the mitigation strategies we’ve seen proposed, particularly more recently, is disabling Hyper-Threading. Apple has issued an update related to MDS, notifying its users that they can disable HT if they want to limit the ability of data to leak between multiple threads within the same CPU core. They’ve also stated that this can hit performance by up to 40 percent. That’s an extreme case because HT isn’t generally “worth” that much performance to an Intel CPU — we’d expect the typical impact to be in the 20-30 percent range — but it’s still a significant whack and far more performance than we typically see from a new CPU version.

There has been genuine expert disagreement on the degree to which people need to do this in order to protect themselves. Some, like Theo de Raadt, who runs the FreeBSD project, have disabled HT by default. Other OS’s have yet to take this step. Companies like Apple have shied away from telling customers to do this as well, writing: “Although there are no known exploits affecting customers at the time of this writing, customers who believe their computer is at heightened risk of attack [can disable HT].” Some of the patches associated with fixing Spectre and Meltdown have also had performance impacts, though some of the impacts were then reduced by further patches, and the degree of slowdown is workload and, to some extent, CPU architecture dependent in the first place.

In the long run, we expect AMD, Intel, and other vendors to continue patching these issues as they arise, with a combination of hardware, software, and firmware updates. Conceptually, side channel attacks like these are extremely difficult, if not impossible, to prevent. Specific issues can be mitigated or worked around, but the nature of speculative execution means that a certain amount of data is going to leak under specific circumstances. It may not be possible to prevent it without giving up far more performance than most users would ever want to accept.

Now read:

Check out our ExtremeTech Explains series for more in-depth coverage of today’s hottest tech topics.