Just to clarify all this precision stuff that seems to be confusing people:
Pixel Shader 2.0 spec is 24 bit minimum - with support for up to 128 bits (that technology's a year or two off, at least). This precision is a hardware issue; it goes beyond DirectX vs. OpenGL. Its more evident on DirectX because there are other things that DirectX does that exposes the disparity more than OpenGL; partly because DirectX evolves a LOT faster these days than OpenGL, so it has standard features that take advantage of new hardware a lot sooner (MS makes all the decisions it wants, which is good and bad; but OpenGL is a consortium, which means committees have to debate and compromise over months or years to get new features or code approved as "standard").
Anyways, you can run ATi at the 24bit spec. From what I understand, nVidia either runs at 16bit, OR 32bit (16 "doubled-up"). If you want to run PS 2.0 / DirectX9 stuff, you have to run 24bit or higher. Therefore, the nVidia cards suffer from a performance hit because of their inability to run the 24bit stuff natively. They have to "double-up" the 16 bit functionality and deal with the performance hit that this entails.
What Carmack has said, is that if you code things especially for "lower precision" (i.e. the 16bit mode that nVidia does by default), then the card works just as well as the ATI - but that running in "full precision" (24 or 32 bit mode), the nVidia will suffer a performance penalty.
In a nutshell, Carmack is basically saying that to get equal performance out of the cards, he's had to go out of his way to create a special set of code for the nVidia hardware; but the ATI can run "standard code" at a decent level. The amount of performance disparity isn't specified - but if it was 1 or 2 frames per second, I'd bet money that Carmack wouldn't have taken the time to write a custom graphics routine for it!
Glad I've got a little time to sit back and watch before I update my aging GeForce2!!
Take care,
--Noel "HB" Wade