View Full Version : Hand optimization.. worth it?
sparkyboy
08-22-2005, 05:02 AM
Hi guys,
A recent discussion involving myself and a few others regarding the best ways to implement collision detection in a CELL based world, got onto the subject of pre-calculated paths etc.( I started it sorry) :D
So I'd like to discuss and gauge your opinions on the merits of hand optimization on todays computer beasts.
Is it worthwhile anymore using arrays, either for movement or to pre-calculate the sin and cos of angles instead of doing it in real time? How important is it to implement 'ASSEMBLY' for the blitting procedure instead of your chosen language? Is LOOP UNROLLING just a distant memory? Does BIT SHIFTING have a place anymore or has multiplication/division surpassed it?
Is the biggest bottleneck the problem of video cards and pipelines, so no matter what your code does, IT'S UP TO THE GRAPHICS CARD????
Remember I'm talking 2D here, whether accelerated or not!!!
Your thoughts and insight much appreciated.
Edit:
Also should we put complete faith in today's so-called optimizing compilers to do the work for us?
All the best
Mark.
svero
08-22-2005, 05:15 AM
Well the general rule for optimization is first to write the code in the clearest fasion and then, if there seems to be speed issues, later to identify any bottlenecks using precise timers and optimize only the smallest part of the code you can.
I'd say that nowdays most of the bottlenecks have to do with video writing and reading and not too often with software. Processors are really fast, and unless you're doing a really complex calculation or iterating through something in a non linear fasion in most cases you probably won't notice any speed problems. There may be some very special cases where you want to carefully optimize stuff, but most of the time, Id say you just want really clear code that's easy to maintain and edit.
The one exception might be with how you deal with hardware. That is ... you might want to "optimize" the number of reads or writes you do to video memory or something of that nature. That tends to be more about the overall structure of how your engine blts stuff than the kind of optimizations you described in your post.
Nikster
08-22-2005, 05:24 AM
I agree with Steve, it's important to not even think about optimising at all until you know your methods are somewhat bug free as trying to bug fix hand optimised code is a PITA, and you tend to optimise the wrong areas, and as Luggage and others said in the othe thread, compilers do usually optimize to shifts or whatever method is the quickest for the target platform. I have seen so called optimised code that uses shifts and all sorts or things for the result to be then piped into a GetPixel function :)
Adrian Cummings
08-22-2005, 05:28 AM
Hi (again)
If you are writing a 2d game as stated you would want to write the code so it's as easy to read as pos first yes.
But... then if you notice slowdown for example then you might want to hand optimize - for example a 2D particle system with over a hundred individual sprites for an explosion would begin to chug on most mid range PC's if you did'nt optimise later.
Having said all that tho to be honest machines are so fast now that 2D is not even a prob anymore really in terms of speed say against an old 8 or 16bit machines of course :)
I like to optimise tho as then I know I've got as much out of the code as I can but then I'm an old git! :)
Adrian.
sparkyboy
08-22-2005, 05:40 AM
Thanks guys, I agree that you need to identify where the problem is first by profiling...and Steve I agree that a lot of problems are more or less graphics card related as you say, processors are blisteringly fast ( well more often than not).
Being old school if you will, I just can't help looking at FOR....NEXT loops and wringing a few extra cycles out of them if possible.
As for calculating movements of 100 or more 'BOBS' ;), I love to try and use pre-calculated tables for that if the situation warrants, and is at all possible.
All the best
A zombified Mark :D
PeterM
08-22-2005, 05:54 AM
If optimising I'd say the biggest thing to look out for is cache misses. I wouldn't bother putting anything in a table, since you're purposefully making the CPU fetch from a wider memory area.
The only thing worth putting in a table are things which take a long time to calculate. For example, finding the nearest entries in a palette for various levels of brightness. Doing that per pixel would be bad, and a 256 byte array isn't going to bust the cache.
Things like sinf and cosf are faster (or at least as fast) when just called. Don't bother with a table. It's not worth the extra coding time and potential bugs.
Also, nice to hear you say "bobs" - reminds me of the Amiga days!
Pete
Adrian Cummings
08-22-2005, 06:16 AM
On the other hand you don't have some of the luxury of all the tasty extra cpu speed etc. on mobile phone games (currently what I'm doing right now) and it's a constant battle with frame rate against memory on many of the older devices you have to support - hence why I prolly got involved in this topic in the first place regards optimising your code - I forgot I'm not coding on PC at the moment - GBA was just as bad really as was GBC before it etc.
I also happen to think that amongst other things learning to hand optimise your code is a good way of becoming a more proficient programmer even it does not make your game better.
Cheers,
Adrian.
Savant
08-22-2005, 06:38 AM
Being old school if you will, I just can't help looking at FOR....NEXT loops and wringing a few extra cycles out of them if possible.
A lot of coders who come from the old days have this problem. You need to let that go. Unless that for/next loop is inside a critical piece of code that runs every frame - it's not worth it. Spend your time somewhere else. Your game will be better for it.
Mike Boeh
08-22-2005, 07:25 AM
I just went through a very sad time. Why? Because I retired my sin/cos lookup tables forever. They served me well over the years :-)
Seriously, I barely optimize anything anymore. My order of importance these days for a chunk of code is:
1. Functionality
2. Ease of re-use
3. Time to develop
4. Cleanliness
.
100. Performance
Everyone is right, you have to let go of your old-school mentality of writing the fastest code possible. It just isn't worth it anymore.
svero
08-22-2005, 07:29 AM
What amazes me is just how fast processors are. Like many I have a pretty fast update loop (125 hz) and I do all kinds of crazy code in there. For beetle I must walk the list of bugs a zillion times just to check small things. It could be optimized to just walk the list once, but in terms of clarity that's not as good. I've not noticed any slowdown. Seems I can stick anything at all in my update loop. It's like an infinite black hole of processing power even on the older crappier systems.
dntoll
08-22-2005, 07:56 AM
just wanted to share something and this seems like the right thread...
About a year ago I bought MS Visual studio C++ .net Standard edition, which does not contain an optimizing compiler!!! They save that for the enterprice edition I guess... But they also let us dowload MS Visual studio Toolkit 2003 (http://msdn.microsoft.com/visualc/vctoolkit2003/) for free, which contains the optimizing compiler... so all you have to do is to copy some files from that into the right folder and you can use the optimizing compiler! :D
Well I dont know if the speed is much improved (not noticable on P4) but the .exe gets much smaller!!! (1.5Mb unoptimized -> 1.0Mb optimized) and 500kb can be alot of content...
:rolleyes:
Emmanuel
08-22-2005, 08:18 AM
For Atlantis, I compiled everything in debug mode (no compiler optimization at all), including PTK itself, for 95% of the development time, and the game ran at full frame rate even though there is no specific optimization of anything. Once compiled in release mode with full fledged optimizing, there was no speed difference whatsoever, on a target machine (1 GHz p3 with 8 MB VRAM).
Like svero, I walk the list of balls a zillion times during the update cycle and it doesn't seem to matter much. The main cpu is mainly counting its fingers while the graphics cards blits and rotates textures around.
I don't know about SDL based games, but for D3D or OpenGL based games, focus on readability and fast development instead.
Best regards,
Emmanuel
Ska Software
08-22-2005, 08:30 AM
In my opinion, in this business optimization is overrated. But what do I know about this business?
Adrian Cummings
08-22-2005, 09:43 AM
Oh well there you go then - basically who cares as long as it works eh!.
Lazy coder brain goits the lot of ya! :)
Adrian.
tentons
08-22-2005, 09:45 AM
This is where I can't resist chiming in with a suggestion to use something like dearest, beloved Python for development instead of that nasty old C++ stuff. :) If we aren't CPU constrained anymore, why the hell are you guys still using C++?? Ok, for core rendering and timing, I totally agree C/C++ is the way to go. But for everything else...?
I'll be quiet now.
princec
08-22-2005, 10:21 AM
I get a machine to do all the optimising for me, at runtime.
Actually I don't even bother with that now either.
Cas :)
Savant
08-22-2005, 10:23 AM
Oh well there you go then - basically who cares as long as it works eh!.
Well ... yeah, actually. Spending time on pointless optimization for the purposes of coder pride is wasted time (which == money, if you're trying to run a business).
Adrian Cummings
08-22-2005, 10:28 AM
Well forgive me but I was talking mobiles and gba rather than pc games yeah so that is utter cobblers in my opinion - yes.
Adrian.
It's not so much that optimization is gone completely these days, it's just that the face of it has changed.
While there's not a lot to be gained through hand instruction scheduling, there are other things you can do. As PeterM said, most apps are memory bound anyway. Massaging your data (and sometimes, code) to be cache/bus friendly can often bring massive performance gains, not to mention is quite rewarding.
Also, while sin/cos tables are gone, it's not like lookup tables are gone forever. Don't forget that every time you use a texture on a 3D card, you are using a lookup table :)
soniCron
08-22-2005, 10:56 AM
With the exception of obviously CPU intensive code, like the experimental realtime synthesizer in my game engine, optimization is a waste until you know where the juice is going. Profile, then optimize. But, by the same token, I'm now forced to optimize in new ways. All my coding is now done entirely in scripts, which run hella slow compared to natively compiled code. Now I'm forced to be a little more clever about how I approach certain problems, and I think it's a valuable experience; it's like programming on a 486, except I don't have the luxury of assembly optimization. Of course, I'm also not blitting 64,000 pixels each frame with the script, either. ;)
illume
08-22-2005, 04:32 PM
Optimization is really worth it in libraries I think.
For example, with SDL. Writing mmx blitters that will be used in thousands of games really was worth it. But, that didn't come until the original ones worked fine, and the blitters were identified as often being the major slow down. Same with the mixers being heavily optimized etc. Same with the opengl function caching system in SDL. etc etc.
For optimizing opengl calls, anything that can speed up your iterations at trying out different approaches can be really worth it. Reducing state changes, and how much is drawn to screen, using triangle strips, and display lists of display lists etc.
Only if it is slow!
For optimizing opengl calls, anything that can speed up your iterations at trying out different approaches can be really worth it. Reducing state changes, and how much is drawn to screen, using triangle strips, and display lists of display lists etc.
Actually, these are great examples of how quickly optimization strategies can change.
On any GPU with a post TnL cache, you usually DON'T want to use tri strips. And optimizing "how much is drawn to the screen" must always be balanced with "how much CPU time do I want to blow away just to determine the PVS?".
Not really arguing with you here, just showing how the point of "profiling first" can be made even when these examples are looked at in a different perspective.
ggambett
08-22-2005, 09:09 PM
Except for very specific stuff, I don't think writting assembly code or even messy C++ code is good in terms of benefits - the small increase in performance results in a greater uglyness and unmaintainability of the code. Worthwhile optimization often involves rethinking the way you do things - to put it another way, a mediocre Quicksort will probably beat a hand-optimized assembly Bubblesort every time.
Graphic algorithms and other data-processing-intensive tasks are a different beast. There's a limited number of ways to implement an image rotation, so yes, you can get a performance improvement by hand-optimizing the code; but that should be done as a last resource, for example after you've considered and implemented a good image caching strategy. And of course, don't optimize what you think is slow, but what the profiler says is slow. And lastly, don't optimize if you don't have to - premature optimization is often worse than no optimization.
ggambett
08-22-2005, 09:26 PM
This is where I can't resist chiming in with a suggestion to use something like dearest, beloved Python for development instead of that nasty old C++ stuff.
Oh, I would. I love Python. But for me the show-stopper is the lack of a clean, cross-platform way of making an executable. Maybe there is one now? It's been a while since I last searched.
svero
08-22-2005, 09:31 PM
I actually use ++ because you can write much cleaner more maintainable code with a proper language. The idea of using a script of any kind.. especially python based is.. well baffling to me. I suspect the peopel who think of ++ as a complicated language that leads to any kind of trouble just aren't familiar enough with it, or don't have the proper training and background. In that case it probably is better to use something higher level since ++ can be a messy complicated nightmare for a beginner.
tentons
08-22-2005, 10:34 PM
@Gabriel: There is pygame (http://www.pygame.org/news.html), which sits on SDL and has quite a few extensions now, including OpenGL renderer (http://lgt.berlios.de/) and other interesting stuff (http://www.pygame.org/projects/9). There's also something called pyrex (http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/) which seems to take slightly specialized python code and produces compiled output. It isn't a simple translator/converter, as it sounds at first. I haven't used that but it sounds really cool, and I will be looking into it soon.
Pyrex lets you write code that mixes Python and C data types any way you want, and compiles it into a C extension for Python.
@svero: That's flamebait if ever I saw it. :) Any tool that gets the job done at the desired level of quality is worthy, in my opinion.
ggambett
08-22-2005, 11:12 PM
@tentons : Thanks, I'll check that pyrex. About pygame, I don't think I'd use it... I'd more likely generate bindings to my framework
@svero : I consider myself very proficient with C++, I write very bug-free code very quickly,... and yet I love Python. Have you given it a serious workout? It's amazing what you can do with it. Its functional programming thingies opened my mind to a new way of thinking, more than the functional programming courses at the university. Really, give it a try!
PeterM
08-23-2005, 12:48 AM
On any GPU with a post TnL cache, you usually DON'T want to use tri strips.This interests me because I currently use triangle strips in IGE for blitting and bitmap font rendering. Do you know of anywhere which explains why triangle strips are less efficient on these GPUs?
IIRC, GPU caches only take effect anyway when using indices, which I don't (since I don't reuse enough vertices to make the extra processing worthwhile).
Pete
princec
08-23-2005, 04:24 AM
That's why I prefer Java because it's got the best bits of both. Actually there's a point to this post - it's often misunderstood how Java works, I keep hearing from people even now 10 years on that it's interpreted - but the way it works is this: your bytecode is machine-optimised at runtime into pretty fast machine code, taking advantage of whatever special instructions are available for the architecture at hand. Even the memory layout of your objects is optimised for you. The very latest Java 6.0 stuff has some very powerful SSE2 optimisation built in taking Java performance ahead of even the revered Intel compiler in some tests. The inlining and malloc/free is already way more powerful than C++ can achieve.
All of which would have been irrelevant to you if you'd coded in C++, because you'd have probably picked the lowest common denominator and compiled against that :) One day svero will see the light and give it a go.
See, let a machine do the job of optimising for you! Your task now as a programmer is to optimise the algorithms not the instructions.
Cas :)
Yea. Highlevel optimisations yield the biggest gains anyways. Eg using a somewhat smarter algorithm and blam... 20% more speed. Or spending 15minutes for saving 10-20% in filesize. But smaller gains... well just pull the brake where it starts getting silly. I know its tough, but everyone (especially me) should spend their time with things which actually make a difference.
Savant
08-23-2005, 04:49 AM
See, let a machine do the job of optimising for you! Your task now as a programmer is to optimise the algorithms not the instructions.
That sums up the job of the modern day programmer perfectly. The days of worrying about machine instruction counts are over - worry about your algorithms and class designs instead.
Adrian Cummings
08-23-2005, 04:50 AM
Please let this thread be over now :)
I'm stuck on j2me right now and I still have had to optimise the buggery out of my code to get a decent sprite system and game engine working on loads of phones.
I'm not talking bit shifts in place of divides here I'm talking keeping the code nice an tight with regards to sprite (bobs) and tile engine to run at a steady 10fps and then theres memory sandbox of 64K on the lower devices I have to code for! - it's as tight as you like and bugger all like a PC! more like a C64!.
PC?... been there and done that - easy peasy! ;)
Adrian the old git.
Savant
08-23-2005, 05:06 AM
Right, in your situation, it makes sense. I don't think anyone is arguing against that.
The problem comes when people start carrying that attitude and work style forward to the PC. That's what most people are talking about.
Adrian Cummings
08-23-2005, 05:11 AM
OK all cool there then I agree :)
I think this all started when I pointed out a bit shift was faster than a divide to a fellow programmer who was writing a tile collision routine.
My mistake as I pointed out earlier that I 'forgot' I was working on mobiles and not Pc at this moment in my life and should of just shown him how the code should work rather than going on about fiddling with it once it did.
Everybody here has my full agreement of course!
Group hug then anyone :)
Adrian.
Jim Buck
08-23-2005, 09:35 AM
I see people still using bitshifts instead of divides in the console world. I just want to let the world know this - most compilers will convert divides/multiplies to adds/bitshifts automatically these days! So, it's better to keep the more-readable divide in your code. :)
sparkyboy
08-23-2005, 10:17 AM
Hehe, great discussion guys.Very interesting indeed.All you guys who never had to use an AMIGA,ST or even 386 PC (ARRRGH!! :D ), are really lucky to not have to worry about tight loops and bit shifts or even maybe lookup tables on the PC these days....oh how times have changed!! :(
I think basically the consensus is that if you program for the PC of today, dont't worry about hand optimization, untill your profiler points you to the bottleneck, whereas slower machines i.e. mobiles, may require as tight a code as possible!! :cool:
With that gentlemen I shall bring this thread to a close!! :p (but feel free to add your 2 cents worth if you want to.....I aint a MOD!!)
All the best
Mark.
This interests me because I currently use triangle strips in IGE for blitting and bitmap font rendering. Do you know of anywhere which explains why triangle strips are less efficient on these GPUs?
IIRC, GPU caches only take effect anyway when using indices, which I don't (since I don't reuse enough vertices to make the extra processing worthwhile).
Pete
Not to keep the thread alive, but yes, post TnL cache only takes effect with indexed tri lists :)
For the purposes you mentioned, either should be fine; the main reason to switch to indexed tri lists/utilize the cache is when you are rendering a few hundred thousand tris and are trying to increase batching and avoid creating bubbles in the pixel pipe.
gmcbay
08-24-2005, 01:31 PM
This is where I can't resist chiming in with a suggestion to use something like dearest, beloved Python for development instead of that nasty old C++ stuff. :) If we aren't CPU constrained anymore, why the hell are you guys still using C++??
Every time I attempt to use a non-C++ language to implement a large project, or even just act as a scripting solution within a larger C++ framework, I become hopelessly frustrated by the lack of a decent debugger. I know saying this will likely open me up to people claiming that language XYZ actually has a perfectly usable debugger, but chances are I've tried using it already and found it vastly inferior to the debugging experience with C++ (using Visual Studio .NET 2003 as my primary enviornment).
There is one exception .... I'd absolutely love to use C# and I think the .NET languages have great debugging in the Visual Studio .NET IDE but I'm extremely put off by the size of the .NET runtime and all the problems associated with that issue that have been discussed ad nausem here and on other indie game developer forums.
princec
08-24-2005, 01:34 PM
So try Java :) Just like C# except a little more mature, and you can get round the runtime distribution problems with a little guile.
Cas :)
PeterM
08-24-2005, 03:13 PM
I've found debugging Java using Eclipse about as good (give and take) as debugging C++ in VC++ 2002. It works well, and is painless. A lot better than trying to debug something like (insert your least favourite higher level language here).
It's where you have to debug DLLs written in C or C++ which are called from a Java app, that things get more cumbersome. Unless there's a neat way to do it that I've missed.
princec
08-24-2005, 03:45 PM
There is - and, I know it sounds cheesy, but... just don't use native code apart from the barest bones wrappers to essential stuff. The less there is to go wrong the less goes wrong! If you've got some legacy dll / so / dylib you're trying to use.. just find a port to purest Java.
Cas :)
PeterM
08-24-2005, 03:53 PM
In other words,
Man: "Doctor, doctor, it hurts when I do this!"
Doctor: "So don't do that!"
Sometimes it just has to hurt...
ggambett
08-24-2005, 04:07 PM
Do you all really use the debugger that much? I rarely use it, and when I must, it's just a matter of seeing where did I use a null reference or something as simple as that. gdb (command line and ugly) does the job just fine. So I don't need a fancy debugger at all. And this is with C++ which is supposed to be an error-prone language, at least by its detractors.
IMHO, if you find the debugger is such an important tool, you should consider improving your coding practices to write correct and bug-free code in the first place.
Please don't take this as an "I'm a better coder than you and my father has more hit points than yours", I do think you can have a tremendous improvement in productivity if you pick up a few good habits - short and to-the-point functions, liberal use of asserts to enforce algorithm invariants, that kind of thing.
princec
08-24-2005, 04:17 PM
When you need to use it... you need to use it. And the easier and more painless it is the better, because usually at that stage you're already about to throw the machine out of the window...
Cas :)
Mike Boeh
08-24-2005, 05:57 PM
IMHO, if you find the debugger is such an important tool, you should consider improving your coding practices to write correct and bug-free code in the first place.
I like being able to press ctrl-f10 to have my program run to whatever line my cursor on... I like being able to put my mouse over a variable in my code, and see a tooltip of its value. I like adding them to a watch, stepping through code line by line, and seeing how things change. All this from an easy to use, intuitive environment.
I would like to think I write good code, with small, readable functions, and carefully planned logic. But there are many times when a great debugger comes in handy. I can't imagine this not being extremely useful. Could I get by without a debugger? Of course, but it would lengthen my development time, and probably shorten my life due to frustration :-)
gcarlton
08-24-2005, 11:01 PM
I see people still using bitshifts instead of divides in the console world. I just want to let the world know this - most compilers will convert divides/multiplies to adds/bitshifts automatically these days! So, it's better to keep the more-readable divide in your code. :)
Technically, in some cases, they can't. If the number is potentially negative, then a shift may give a different result to a divide and so the compiler can't just replace it. Casting an int to an unsigned int before the divide allows the compiler to optimise to its heart content.
Of course unless you are fixing a hotspot found by a profiler, and unless the offending algorithm is optimised first, it is surely wasted effort. :)
James C. Smith
08-25-2005, 08:26 AM
And that only works if you are dividing by a constant. If you want to divide by a variable the compiler can't figure out to do the shift even if you know the value of the variable will always be a power of two.
But very low level optimizations like this are only relevant if they are in the inner loop. As other have stated, you will always get the biggest gains with high level optimizations. But never assume “the compiler will figure it our” or “computer are fast these days”. There are always ways you can make your code go much faster. It just usually isn’t twideling with little low level stuff like this.
I just did some optimization for the Wik Xbox 360 port. We had some issue of “scripts gone wild”. The scripting system stores all values in variants and maintains a string version of every value. Worse, it was using a very inefficient function for converting numbers to string. Doing something like executing the script “X = 5.2” was casing a search and replace of string because it was doing string template substitutions to insert a number into a string. A simple level was doing over two thousand string search and replace options per frame! 3 of them were necessary and the rest were silly. Finding and optimizing that kind of stuff is where it is at. The compiler can never optimize it for you.
Jim Buck
08-25-2005, 09:49 AM
Another suggestion - if you ever find yourself wanting to do asm, maybe instead keep the C version but massage it to help your particular compiler's optimizer. This way you still have relatively readable code. Just check what your compiler compiles your C code to, and rearrange as needed (pull common subexpressions out of loops, keep pointers in registers to avoid aliasing safeties, etc).
This probably won't work too well for pixel-pushing functions since even 1 added instruction could kill the inner loop, but for stuff like collision detection, culling, etc, this should be sufficient.
Adrian Cummings
08-25-2005, 09:56 AM
I see this thread is still going then heh crazy :)
vBulletin v3.6.0, Copyright ©2000-2008, Jelsoft Enterprises Ltd.