Out of the Blue

Here's a question for those of you more technically apt than myself, which includes my grandma, but I forgot to bring this up at brunch yesterday.

I've been trying to run Battlefield 2 in piggish 1600x1200 on my bust-out retail super-duper 6800 Ultra-OC video card. The card itself seems perfectly capable of running at such high resolution, but I have been getting unexplainable Blue Screen of Death errors in Windows XP with an infinite loop in the NV_DISP driver. The problem does not seem to occur at lower resolutions, but at 16X12 it is intermittent, but inevitable, often preceded by flashes of texture corruption. After a go-through with tech support I cleaned out the old drivers with driver cleaner, reinstalled the latest 77.72 FORCEWARE drivers, and confirmed that if the card's absurdly high 120 degree centigrade heat alarm was not being set off, the problem was probably not thermal.

I then tried again, with the same result.

The BSoD included a message to the effect that the problem was likely with either the driver or the card, so I called back tech support, and was told that it was possible that this was a problem with the application itself. I was then told that in order to demonstrate that the card itself was defective, I was in for the nightmarish prospect of repeatedly reproducing the problem after, a) reinstalling the game, b) using a different 3D app, and c) repeating (a) and (b) on a second machine altogether. Now if that's the process I really must go through to determine the problem, then fair enough, but a Google search on NV_DISP infinite loop errors seems to indicate that this is a problem that's been mysteriously plaguing NVIDIA users for literally years now, and troubleshooting tips range from the useful, like testing your RAM (seven passes by memtest 86 says this wasn't the problem), to the worthless (almost every time someone asks this on a forum he is inundated with driver rollback suggestions, but this has happened with three different driver revisions now), to the spooky (I don't think the most desperate or reckless of users would implement all the different registry hacks I've seen suggested to address this). The one bit of video card related voodoo I still plan I trying here is backing off the AGP speed from 8X to 4X, which worked for me on a different problem once before.

So anyway, my question is simply this (I bet you had almost forgotten by now that I started off promising a question here). Before I embark on the lengthy path set out for me by Mr. Tech Support: is his assertion that the BSoD could be cause by the game code itself is accurate? I know that misbehaving apps are not supposed to be able to crash the system (which to my recollection was genuinely true for my in several years of running Win2K Professional), but I also know that just because something is not supposed to happen, doesn't mean it is impossible (I remember seeing proof-of-concept BSoD code for WinNT that was all of three lines).

So... no application-triggered BSoDs in WinXP... fact, or fiction?

Blue Links of Death! Thanks Mike Martinez, Ant, and EvilToast.
Links: How To Make Your Own Custom Body Kit.
Stories: Coke tries to can Indian poster.
14-Pound Baby Girl Born in Kentucky.
Internet provides instant Harry Potter reviews.
Science! Small Earthquake Shakes Mount St. Helens.
Oklo: Natural Nuclear Reactors. Thanks Flying Penguin.
Poaching making China elephants evolve tuskless.
Media: JCB Ballet. Thanks Lost Dragon.
Follow-up: Thousands mark first atomic blast.
Shuttles Dogged by Aging Parts.
Teh Funny: FoxTrot.
View : : :
40.
 
Re: BSoD
Jul 17, 2005, 20:46
40.
Re: BSoD Jul 17, 2005, 20:46
Jul 17, 2005, 20:46
 
I've been a Windows programmer since 1991, and I've been using Windows-NT-based OSes since they were first released... 1993 iirc. (Windows 2000 and Windows XP are NT-based.) And I can safely say your diagnostician is living in a fool's paradise.

Theoretically, neither BF2 nor any other program should be able to BSoD the system. The design of Windows NT should preclude that. But there are two ways in which it could.

First of all, in Windows NT, programs run in "user-space", not in "kernel-space". This uses facilities on the CPU (user space is "ring 3", kernel space is "ring 0") to ensure protection. Code in user-space isn't allowed to do a lot; if it tries to do something malicious, it gets stopped cold. In order for a well-behaved user-space program to do anything *interesting*, like open a file or play a sound, it has to ask the kernel to do it; because the kernel is in "ring 0", it can do anything it likes. So theoretically the only way a user-space program could crash the OS is by asking the kernel to do something that results in the *kernel* crashing. A malicious request with a deliberate buffer overflow, or just a bug in the kernel. This isn't supposed to ever happen, as the "take requests from user-space" part of the kernel is ten years old and battle-hardened. Honestly I doubt this is the problem; this is what your diagnostician pal is thinking of when he says it's "not possible".

But there's a second concern. Since the original NT, many drivers have been moved out of "user space" to "kernel space". Windows NT's original design was "microkernel"-ish, in that *every* device driver was forced to live outside the kernel. That way, again, it wasn't *possible* even for bad *drivers* to crash the system! (You can perhaps appreciate in what ways this is a good design.)

However, in order to actually get anything *done*, you had to suffer a large number of "ring transitions", where you go into and out of the "kernel" dozens of times in order to get anything done. For example, if BF2 wants to send a texture to the card, it calls the kernel and says "send this to the graphics card". We've now gone from ring 3 to ring 0, which I'll mark as R3->R0 henceforth. The kernel now turns around and calls the graphics driver (R0->R3), which says "ah, yes, to do that you need to poke at the card *this* way (R3->R0 and back) and *that* way (R3->R0 and back) and so on.

All those ring transitions add up pretty quickly. In the world of application programming, they are shockingly slow, and you wind up hitting a *lot* of 'em very quickly. So, over time, the NT group has been moving more things out of user-space and into kernel-space. Graphics drivers were moved into the kernel with NT 4.0, back in the late-ish 90s.

And guess what *that* means. Now your graphics driver runs at ring level 0, in kernel-space, so it can do anything it wants. Including crash your whole OS. Graphics drivers are notoriously buggy; they are thrown together to support a new card, and banged on enough until they work relatively well, then shipped. Everyone here has seen more than their share of graphics-driver bugs.

I find it *very* easy to believe that BF2 is calling the graphics driver in such a way that it incurs a BSoD'ing graphics-driver bug.

Date
Subject
Author
1.
Jul 17, 2005Jul 17 2005
2.
Jul 17, 2005Jul 17 2005
3.
Jul 17, 2005Jul 17 2005
4.
Jul 17, 2005Jul 17 2005
10.
Jul 17, 2005Jul 17 2005
5.
Jul 17, 2005Jul 17 2005
6.
Jul 17, 2005Jul 17 2005
7.
Jul 17, 2005Jul 17 2005
8.
Jul 17, 2005Jul 17 2005
9.
Jul 17, 2005Jul 17 2005
11.
Jul 17, 2005Jul 17 2005
14.
Jul 17, 2005Jul 17 2005
12.
Jul 17, 2005Jul 17 2005
15.
Jul 17, 2005Jul 17 2005
16.
Jul 17, 2005Jul 17 2005
13.
Jul 17, 2005Jul 17 2005
17.
Jul 17, 2005Jul 17 2005
 40.
Jul 17, 2005Jul 17 2005
 Re: BSoD
41.
Jul 17, 2005Jul 17 2005
43.
Jul 17, 2005Jul 17 2005
   Re: BSoD
44.
Jul 17, 2005Jul 17 2005
    Google Earth
46.
Jul 17, 2005Jul 17 2005
48.
Jul 17, 2005Jul 17 2005
      Re: Google Earth
49.
Jul 17, 2005Jul 17 2005
    Re: BSoD
50.
Jul 17, 2005Jul 17 2005
     Re: BSoD
67.
Jul 18, 2005Jul 18 2005
      Re: BSoD
62.
Jul 18, 2005Jul 18 2005
   Re: BSoD
70.
Jul 18, 2005Jul 18 2005
18.
Jul 17, 2005Jul 17 2005
19.
Jul 17, 2005Jul 17 2005
22.
Jul 17, 2005Jul 17 2005
23.
Jul 17, 2005Jul 17 2005
24.
Jul 17, 2005Jul 17 2005
45.
Jul 17, 2005Jul 17 2005
38.
Jul 17, 2005Jul 17 2005
72.
Jul 18, 2005Jul 18 2005
80.
Jul 18, 2005Jul 18 2005
20.
Jul 17, 2005Jul 17 2005
53.
Jul 18, 2005Jul 18 2005
21.
Jul 17, 2005Jul 17 2005
25.
Jul 17, 2005Jul 17 2005
27.
Jul 17, 2005Jul 17 2005
28.
Jul 17, 2005Jul 17 2005
42.
Jul 17, 2005Jul 17 2005
47.
Jul 17, 2005Jul 17 2005
26.
Jul 17, 2005Jul 17 2005
29.
Jul 17, 2005Jul 17 2005
30.
Jul 17, 2005Jul 17 2005
31.
Jul 17, 2005Jul 17 2005
32.
Jul 17, 2005Jul 17 2005
33.
Jul 17, 2005Jul 17 2005
35.
Jul 17, 2005Jul 17 2005
54.
Jul 18, 2005Jul 18 2005
34.
Jul 17, 2005Jul 17 2005
55.
Jul 18, 2005Jul 18 2005
56.
Jul 18, 2005Jul 18 2005
63.
Jul 18, 2005Jul 18 2005
64.
Jul 18, 2005Jul 18 2005
65.
Jul 18, 2005Jul 18 2005
     Re: No subject
66.
Jul 18, 2005Jul 18 2005
      Re: No subject
71.
Jul 18, 2005Jul 18 2005
      Re: No subject
61.
Jul 18, 2005Jul 18 2005
73.
Jul 18, 2005Jul 18 2005
82.
Jul 18, 2005Jul 18 2005
36.
Jul 17, 2005Jul 17 2005
37.
Jul 17, 2005Jul 17 2005
51.
Jul 17, 2005Jul 17 2005
57.
Jul 18, 2005Jul 18 2005
58.
Jul 18, 2005Jul 18 2005
52.
Jul 17, 2005Jul 17 2005
59.
Jul 18, 2005Jul 18 2005
60.
Jul 18, 2005Jul 18 2005
68.
Jul 18, 2005Jul 18 2005
69.
Jul 18, 2005Jul 18 2005
74.
Jul 18, 2005Jul 18 2005
75.
Jul 18, 2005Jul 18 2005
76.
Jul 18, 2005Jul 18 2005
77.
Jul 18, 2005Jul 18 2005
78.
Jul 18, 2005Jul 18 2005
81.
Jul 18, 2005Jul 18 2005
79.
Jul 18, 2005Jul 18 2005