Another Timing (QPC) gotcha
I've searched the forum on this and haven't found a mention, so I thought I'd let others who haven't encountered this problem before know that it exists.
The story is that the performance timer may return unreliable values on dual core machines. I stumbled across it when a dual core setup was returning sluggish player input and uncapped frame rates (when they should have been capped).
The quick solution is to SetThreadAffinityMask(GetCurrentThread(), 1); on the main game thread.
"Finally while the QueryPerformanceCounter / QueryPerformanceFrequency API is intended to be multiprocessor aware, bugs in the BIOS or motherboard drivers may result in these routines returning different values as the thread moves from one processor to another. We recommend that all game timing be computed on a single thread, and that thread is set to stay running on a single processor through the SetThreadAffinityMask Windows API. Typically this would be the main game thread. All other threads should be designed to operate without gathering their own timer data. We do not recommend using a 'worker' thread to compute timing as this will become a synchronization bottleneck. It is recommended that worker threads are designed to read timestamps from the main thread. Since the worker threads only read the timestamp there is no need to use critical sections. We also do not recommend have multiple threads compute timing and associating each with a specific processor, as this will greatly reduce the throughput on multicore systems."
This is another facet of the same problem.
The QPC timer is implemented in NT by the HAL, the "Hardware Abstraction Layer". That's the binary handling all the details of how NT talks to the hardware--what hardware is available, what memory locations map to what functions, and so on.
These days most HALs use a timer on the PCI south bridge chipset with that famous frequency of 3.5MHz. However: on some multiprocessor machines, the vendor decided not to use this timer (I'm guessing because they were worried about reentrant access) and instead used the TSC (the "Time Stamp Counter") on the CPU. This counter contains how many cycles the CPU clock has executed so far.
So now, using the TSC to implement QPC has two problems:
Solving the problem would require a complicated heuristic, most likely a state machine, and it's rare enough that I haven't been willing to invest the time.
- The TSC counters on multiple cores will drift apart, meaning that you will get strangely inconsistent timings, as the thread jumps back and forth between the two cores. This is easily solved by setting thread affinity.
- As discussed in that other thread you cite, machines which have multiple cores and SpeedStep will change the CPU frequency at effectively random intervals, which means the frequency of QPC changes at effectively random intervals, and there is no direct way to detect it.