Heap corruption Bug. Help!

Discussion in 'Game Development (Technical)' started by cliffski, Apr 13, 2007.

  1. cliffski

    Moderator Original Member

    Joined:
    Jul 27, 2004
    Messages:
    3,897
    Likes Received:
    0
    I have a bug I can't find. I'm aware that many of you are better coders than me and understand all this 'heap' malarkey more than me.

    I have a bug that takes ages to trigger, it basically crashes in the middle of nowhere (like ntdll) with this message:

    Code:
    HEAP[KudosRockLegend.exe]: HEAP: Free Heap block 5f2a4b0 modified at 5f2a7ec after it was freed
    Windows has triggered a breakpoint in KudosRockLegend.exe.
    
    (address varies).
    It only happens if I play the game for around an hour or more, and it alway happens a few seconds into the 'gig screen' which has lots of particles. However, if I try and test it by really hammering the gig screen without playing the rest of the game, it doesn't happen :(

    every tick of the game I put in a call like this:

    Code:
    		if(_heapchk() != _HEAPOK)
    		{
    			GERROR("Heap invalid");
    		}
    
    and that does NOT trigger.

    The thing is, I'm not really sure what causes this. It doesn't seem to be a NULL pointer being accessed, or any dangling pointer, it's something I've never seen before. I doubt its a simple buffer overrun as I use STL lists and vectors for that kind of thing.
    I've checked for memory leaks, and so on, and should be fine, the game uses about 90MB of RAM at the point when it triggers. I'm running it in debug mode on a multi-core CPU under Vista.
    Any advice on this kind of bug and how I might have caused it is much appreciated.
     
  2. zoombapup

    Moderator Original Member

    Joined:
    Nov 25, 2004
    Messages:
    2,890
    Likes Received:
    0
    I'd almost ALWAYS put this down to buffer overrun or writing to memory. Thats why its such a nightmare to debug these things and why they are difficult to reproduce.

    I'd make sure you added check guards after allocations (override the new operator and add additional protection).

    I'd also recommend downloading an "evaluation" copy of boundschecker and running that (I think you can get a 30 day eval). Also, try adding some journalling to your app to see if you can get a reproducable case that doesnt involve you hammering on it all day.

    http://sairama.weblogs.us/archives/Docs/bugslayerTips.html

    Also, I'd suggest getting Robbin's book, the name of which escapes me right now. Do a google for "Bugslayer". Scott Bilas was saying that there is a really nice stack trace library you can get with the book that gives you much more than just a simple crash address. Highly recommended.
     
  3. TimS

    Original Member

    Joined:
    Feb 9, 2005
    Messages:
    686
    Likes Received:
    0
    Do you by any chance have your particle effects in a <vector>?

    This just seems familiar to me...
     
  4. cliffski

    Moderator Original Member

    Joined:
    Jul 27, 2004
    Messages:
    3,897
    Likes Received:
    0
    I have a fixed array of particles per emmiter, and a central manager maintains an STL list of emmiters.
    it's not an iterator being invalidated though is it? as i recall, you get a different error there?
    and its weird that in my test app, the same particle engine won't trigger it ;(
     
  5. Michael Flad

    Indie Author

    Joined:
    Aug 4, 2004
    Messages:
    190
    Likes Received:
    0
    Just to trigger thinking in different directions as it's hard to imagine where this kinds of bug may be happen without knowing the code.

    Do you use timers (timeSetEvent IIRC)? I once (more than half a decade ago, so I don't remember exact details) had to track a bug in my employers framework caused by threading problems created by code handling an event/frame queue using a timer. Timers use multithreading and in our case back then the framework programmer used his own "semaphores" but not critical sections to protect the code. In the end it probably made the error even worse to track as it almost never happend on a single CPU PC but once we got dual CPU PCs we got the bug/crash every now and then (but still speaking about once every few hours).

    So this may not be of any help but if you use timers you should have a look at the code of your callbacks and if there may be a small chance of accessing (i.e. writing to) memory your game may not own anymore.
     
  6. cliffski

    Moderator Original Member

    Joined:
    Jul 27, 2004
    Messages:
    3,897
    Likes Received:
    0
    I don't do any multithreading, partly because of such nightmares. the sound engine I use, *is* multithreaded, but it won't be that, as its been used in loads of other games, including kudos.
    Thanks for the advice, and the links that have been posted. I have some serious investigating to do.
    Here's hoping it's not some obscure vista thing ;(
     
  7. ZuluBoy

    Indie Author

    Joined:
    Jan 28, 2005
    Messages:
    236
    Likes Received:
    0
    Try the free gflags tool (I don't know if it works on Vista, but it should).
    I have used it once, and I was able to fix a *lot* of heap corruption bugs (even the one that were never triggered before).

    Goto this page and look for Example 12. If you are using Visual C++, you can skip the steps 3,4,7 an 8.

    Basically this how to use it:

    1 - From command line, run gflags /p /enable KudosRockLegend.exe /full to enable heap verification for your application.
    2 - Run your application under a debugger ( Visual C++ or WinDbg ); be aware that your application will be slower than usual.
    3 - Ideally whenever there is an heap corruption or something like that, you application will break, and under Visual C++ the call stack will be available.

    Don't forget to manually disable heap verification when you are done. the command for that is: gflags /p /disable KudosRockLegend.exe /full.
     
  8. Pyabo

    Original Member

    Joined:
    Jul 27, 2004
    Messages:
    1,315
    Likes Received:
    0
    I second the recommendation of BoundsChecker... very useful tool.
     
  9. Ferdi

    Indie Author

    Joined:
    Nov 5, 2004
    Messages:
    25
    Likes Received:
    0
    Here is a tip I used when I am looking for hard bugs in my game:

    I have a recording input system. All the input is sampled at every frame, and saves into a file. For example, for a mouse I have x, y co-ordinate plus buttons saved at every frame. You may also need to save the time, depending on what type of main loop you have. Also I have a playback system, which replays that recording.

    Now if you can reproduce the problem once, and you save your inputs into the recording file, in theory you should be able to replay the recording over and over again and the problem will be 100% reproducible. Well, it did worked for my game.

    You could even fast forward the recording (depending on the speed of your computer and whether it is a timing bug or not), and get to the problem as quick as possible. In addition, you could stop before the frame that failed and see what it was doing.

    But unfortunately, writing the recording and playback system may take a while. I was lucky in that I already had a recording / playback system going.

    Hope you solve your problem soon.

    Ferdi
     
  10. Applewood

    Moderator Original Member Indie Author

    Joined:
    Jul 29, 2004
    Messages:
    3,858
    Likes Received:
    2
    I've got a number of self-defence widgets built into my own malloc/free system (that new/delete also uses).

    The single most "hit" debug function is to add 4K of (repeatably) random shit in front of and behind the pointer returned (add 8Kb to the alloc size and move the ptr on by 4K when you return it). On the free routine, check that the shit is as it should be. Just use a for loop and store the counter in it or something.

    I also keep a list of all these allocations so I can periodically call a routine that checks that ALL allocs still have their shit intact. Keeping that list around also means I can use this StackWalker at app exit to find all mem leaks easily.

    It's worth time expanding your mem routines - they'll look after you and save you time and hair pulling in the long run.
     
  11. GBGames

    Indie Author

    Joined:
    Jul 27, 2004
    Messages:
    1,255
    Likes Received:
    0
    If the bug is only surfacing when you play the game, perhaps you can find out a certain sequence of states that you are entering and leaving that cause the crash.

    Clearly the problem is not the particle engine by itself, but perhaps there is a problem due to some interaction between it and other parts of your code. For instance, perhaps it is something like a destructor not being declared virtual when it should be, or it could be as simple as not deleting something that you should have. Perhaps there is slicing?

    I'd use Valgrind, but you're most likely not developing this game on GNU/Linux. B-)
     
  12. cliffski

    Moderator Original Member

    Joined:
    Jul 27, 2004
    Messages:
    3,897
    Likes Received:
    0
    much to my distress I can't trigger it at all on windows XP. grrrrr.
    Lots of great ideas here for me to try, much appreciated. I never use malloc and free, or delete[], all I use is new and delete, so I will have to fiddle with thsoe as suggested.
     
  13. jessechounard

    Original Member

    Joined:
    Apr 4, 2006
    Messages:
    70
    Likes Received:
    0
    You don't use delete[]? Does that mean you never allocate arrays dynamically?

    It doesn't help you much now, but maybe for your next title you should try using smart pointers a bit more. I very rarely call delete anymore.
     
  14. gmcbay

    Indie Author

    Joined:
    Jul 28, 2004
    Messages:
    280
    Likes Received:
    0
    I also recommend downloading eval versions of BoundsChecker and/or Purify and seeing if they come up with anything.

    Is your project using DLLs? A majority of the time when I've seen unexpected odd heap issues like this that aren't easy to track down the reason has turned out to be data sharing across the main executable and one or more DLLs where the exe and the DLLs (or just one DLL) have been linked against different versions of the C runtime.

    Example: Exe is linked against multithreaded debug version of the crt. Some DLL it uses is linked against single-threaded debug version of the crt. If you allocate memory in the process and pass that to the DLL, bad things will eventually happen because the DLL will have a different idea of the size of the memory than the executable does. And the result will often be as baffling as what you're seeing.

    Anyway, if you're using DLLs, double check that they are all linked to the same version of the crts and any other dependent libs. And then triple check!
     
  15. Thorbrian

    Original Member

    Joined:
    Dec 9, 2004
    Messages:
    65
    Likes Received:
    0
    I bet it is crashing from a side effect of a heap free or heap alloc call from your code. I bet windows is doing a check on the free pool. If you'd like, you can most likely get a stack trace that points all the way back into your code with kernel symbols. A search for "kernel symbols" and the name of your ide should probably be enough to figure out how to hook up the kernel symbols. Getting the stack trace back into your code probably won't point to the error cause it's detecting it after the fact - but occasionally it does help (with double deletes or the like anyways)
     
  16. cliffski

    Moderator Original Member

    Joined:
    Jul 27, 2004
    Messages:
    3,897
    Likes Received:
    0
    well... I havne't been able to trigger it since, and never on XP. I also found a weird thing, in that another game (might have been COD2) triggered exactly the same error, but only under vista. I'm wondering if vista is reporting some totally random issue with this semi-default error message.
    Has anyone else had anything report a heap error under Vista?
    The joy of a new O/S :D
    the game is currently checking the heap each frame AND logging every allocation to a file. Typically, it won't happen now ;(
     
  17. Applewood

    Moderator Original Member Indie Author

    Joined:
    Jul 29, 2004
    Messages:
    3,858
    Likes Received:
    2
    I think Heisenberg was tracking down a heap corruption bug when he had his epiphany!
     
  18. woo

    woo
    Original Member

    Joined:
    Jun 21, 2006
    Messages:
    135
    Likes Received:
    0
    Were you by any chance running a release/optimized build at the time of the bug? It may be that you can't reproduce it because of a different build config from when it was crashing? Reminds of this one time at an ACM programming competition, the machine we were on wouldn't let us access an array inside a loop without crashing. We tried everything we could think of but time was of the essence so we tried to just not use arrays inside loops. NOT FUN.

    Anyway, when we got back home, we ran the same code that would crash on the computer at the competition, used the same compiler and everything worked the very first time. The only thing we could think of that would make sense was some odd setting in the IDE that was causing the optimizer to remove important stuff.

    Good luck!
    -Andrew Douglas
    https://theoreticalgames.com
     
  19. Slayerizer

    Original Member

    Joined:
    Dec 13, 2004
    Messages:
    103
    Likes Received:
    0
    I also had my share of problems in vista. The explorer.exe was crashing on me like five times each night (I use XP at work). I also had crashes with other applications, like msn messenger.

    I did some googling and found a permanent solution: http://www.tech-recipes.com/rx/1261/vista_disable_dep_noexecute_protection_fix_explorer_crashing

    I had no crash since...


    related stuff:
    http://blogs.msdn.com/michael_howard/archive/2006/12/12/update-on-internet-explorer-7-dep-and-adobe-software.aspx
    http://forums.whirlpool.net.au/forum-replies-archive.cfm/717575.html

    more threads can be found by googling : vista crash error dep
     
  20. lennard

    Moderator Original Member Indie Author

    Joined:
    Jan 12, 2006
    Messages:
    2,391
    Likes Received:
    12
    I appreciated ZuluBoys recommendation of gflags - I downloaded it and tried it myself. Didn't solve my problem but that is because I was looking in the wrong place, I will use it again when getting near to release on a new title just to comb for potential problems.

    My own horror story seems to have come to an end - by replacing audiere with Fmod it seems that my crash bug has been fixed. Never happened on my machine, went on for about 5 weeks of off and on looking at the thing, having an idea, firing it off to the three people I knew of who could crash the beast. What a relief that is over.
     

Share This Page

  • About Indie Gamer

    When the original Dexterity Forums closed in 2004, Indie Gamer was born and a diverse community has grown out of a passion for creating great games. Here you will find over 10 years of in-depth discussion on game design, the business of game development, and marketing/sales. Indie Gamer also provides a friendly place to meet up with other Developers, Artists, Composers and Writers.
  • Buy us a beer!

    Indie Gamer is delicately held together by a single poor bastard who thankfully gets help from various community volunteers. If you frequent this site or have found value in something you've learned here, help keep the site running by donating a few dollars (for beer of course)!

    Sure, I'll Buy You a Beer