Why you should wait on upgrading to .Net 4.6
Update (August 11th): A patch for this bug has been released by Microsoft. Here’s their update to the advisory:
We released an updated version of RyuJIT today, which resolves this advisory. The update was released as Microsoft Security Bulletin MS15-092 and is available on Windows Update or via direct download as KB3086251. The update resolves: CoreCLR #1296, CoreCLR #1299, and VisualFSharp #536. Major thanks to the developers who reported these issues. Thanks to everyone for their patience.
Original Post
What follows is the work of several people: Marc Gravell and I have taken lead on this at Stack Overflow and we continue to coordinate with Microsoft on a resolution. They have fixed the bug internally, but not for users. Given the severity, we can’t in good conscience let such a subtle yet high-impact bug linger silently. We are not upgrading Stack Overflow to .Net 4.6, and you shouldn’t upgrade yet either. You can find the issue we opened on GitHub (for public awareness) here. A fix has been released, see Update 5 below.
Update #1 (July 27th): A pull request has been posted by Matt Michell (Microsoft).
Update #2 (July 28th): There are several smaller repros now (including a small console app). Microsoft has confirmed they are working on an expedited hotfix release but we don’t have details yet.
Update #3 (July 28th): Microsoft’s Rich Lander has posted an update: RyuJIT Bug Advisory in the .NET Framework 4.6.
Update #4 (July 29th): There’s another subtle bug found by Andrey Akinshin and the F# Engine Exception is confirmed to be a separate issue. I still recommend disabling RyuJIT in production given the increasing bug count.
Update #5 (August 11th): A patch for this bug has been released by Microsoft, see above.
This critical bug is specific to .Net 4.6 and RyuJIT (64-bit). I’ll make this big and bold so we get to the point quickly:
The methods you call can get different parameter values than you passed in.
The JIT (Just-in-Time compiler) in .Net (and many platforms) does something called Tail Call optimization. This happens to alleviate stack load on the last-called method in a chain. I won’t go into what a tail call is because there’s already an excellent write up by David Broman.
The issue here is a bug in how RyuJIT x64 implements this optimization in certain situations. Let’s look at the specific example we hit at Stack Overflow (we have uploaded a minimal version of this reproduction to GitHub).
We noticed that MiniProfiler (which we use to track performance) was showing only on the first page load. The profiler then failed to show again until an application recycle. This turned out to be a caching bug based on the HTTP Cache usage locally. HTTP Cache is our “L1” cache at Stack Overflow; redis is typically the “L2.” After over a day of debugging (and sanity checking), we tracked the crazy to here:
What was happening? We were setting the MiniProfiler cache duration (passed to Set<T>
) as 3600 seconds. But often (~98% of the time), we were seeing it immediately expire from HTTP cache. Next we narrowed this down to being a bug only when optimizations are enabled (the “Optimize Code” checkbox on your project’s build properties). At this point sanity is out the window and you debug everything.
Here’s what that code looks like now. Note: I have slightly shortened it to fit this page. The unaltered code is on GitHub here.
This is nothing fancy, all we have is some methods calling each other. Here’s the scary result of those LocalCache.OnLogDuration
calls:
- LocalCache.Set: 3600
- LocalCache.SetWithPriority: 3600
- RawSet: null, or 114, or 97, or some other seemingly random value
Here’s an example test run from the GitHub repo:
The method we called did not get the parameters we passed. That’s it. The net result of this is that local cache (which we use very heavily) is either unreliable or non-existent. This would add a tremendous amount of load to our entire infrastructure, making Stack Overflow much slower and likely leading to a full outage.
That’s not why we’re telling you about this though. Let’s step back and look at the big picture. What are some other variable names we could use?
amountToWithdraw
qtyOfStockToBuy
carbonCopyToAccountId
voltage
targetAltitude
rotorVelocity
oxygenPressure
dosageMilliliters
Does that help put things in perspective?
This bug is not obvious for several reasons:
- It only happens with optimizations enabled. For most developers and projects, that’s not in
DEBUG
and won’t show locally.- That means you’ll only see this in
RELEASE
, which for most people is only production.
- That means you’ll only see this in
- Attaching a debugger alters the behavior. This almost always hides the issue.
- Adding a
Debug.WriteLine()
will often fix the issue because of the tail change. - It won’t reproduce in certain scenarios (e.g. we can’t repro this in a console application or VS hosting, only IIS).
- Given the nature of the bug, as far as we can tell, it can equally affect any framework library as well.
- It can happen in a NuGet library (most of which are
RELEASE
); the issue may not be in your code at all.
To address an obvious question: is this a security issue? Answer: it can be. It’s not something you could actively exploit in almost all cases, since stack variables are the things being swapped around here. However, it can be an issue indirectly. For example, if your code makes an assumption with a null
param like if (user == null) { user = SystemUser; }
, then a null
passed in will certainly be a problem, giving other users that access sporadically. A more common example of this would be a value for a Role
enum being passed incorrectly.
Recommendations
- Do not install .Net 4.6 in production.
- If you have installed .Net 4.6, disable RyuJIT immediately (this is a temporary fix and should be removed when an update is released). You can disable RyuJIT via a registry setting (note: this requires an application pool recycle to take effect).
- Under
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\.NETFramework
add auseLegacyJit
DWORD
with a value of1
- Or via PowerShell:
- Under
Be aware, the web.config
method (#3) of disabling RyuJIT does not work. Outside of IIS hosting, applying this fix via app.config
does work.
We are talking with and pushing Microsoft to get a fix for this shipped ASAP. We recognize releasing a fix for .Net on the Microsoft side isn’t a small deal. Our disclosure is prompted by the reality that a fix cannot be released, distributed, and applied immediately.