PDA

View Full Version : SDL general speed up tips?


Robert Cummings
05-19-2006, 04:30 AM
Hiya,

I'm looking for ways to work with SDL to speed it up other than dirty rects. I am also looking to use Additive Blending in software SDL - is this possible? I can't find it in the SDL docs.

I currently convert each image I load into the same pixel format as the display but I don't do much more than that.

Any suggestions to speed stuff up?

ggambett
05-19-2006, 07:51 AM
I'll leave this thread alone because it isn't offtopic, but you should definitely join the SDL mailing list or at least browse the archives. Unlike here, everyone in that list uses SDL :)

As for your questions, 1) convert to display format, 2) use SDL_RLEACCEL when possible, 3) there's no additive blending but you can do it yourself.

Robert Cummings
05-19-2006, 08:03 AM
Thanks Gabriel :) I'll join that list and have a poke about with manual drawing.

Phil Steinmeyer
05-19-2006, 09:07 AM
I use SDL, but mainly to set up video modes. I do all of my blitting myself using hand-rolled routines. That gives me the ability to add any functionality I need, and to optimize hot-spots.

Also, it's easiest to pick a single pixel format (ARGB 32 bit), and write all your routines for that. For the handful of PCs not running in 32 bit and/or not capable, you can have a side-path just before you blit to the screen/flip page that converts down to the appropriate rez. I think SDL even does this automatically.

I've used 16 bit in the past (565 or 555) and that works too, but these days, it's easiest to just go 32 bit.

HairyTroll
05-19-2006, 10:15 AM
I think additive blending is supported by sdl_gfx (http://www.ferzkopp.net/Software/SDL_gfx-2.0/). At least it seems to be in this demo that I wrote (http://svn.sourceforge.net/viewcvs.cgi/*checkout*/lispbuilder/trunk/lispbuilder-sdl-gfx/documentation/index.html). See the screenshot at the bottom of the page.

Robert Cummings
05-19-2006, 10:36 AM
Hi,

I'm using SDL_GFX myself with rotations. Can you elaborate on how you got it to use additive blending? Btw - that shot looks like normal alpha blending to me.

illume
05-19-2006, 11:22 AM
Pygame which uses SDL has additive blending now.

Have a look at the alphablit.c in cvs.

Or I've uploaded it here: http://rene.f0o.com/~rene/stuff/alphablit.c
The blit_blend_THEM function.


Tips for speed? Profile first. Test on old hardware. If you can use hardware surfaces. Convert your surfaces. Don't update very often. 25fps or 30fps jitter free is better than fluctuating frame rates of 67, 56, 56, 70. Plus laptop users will like you better.

Note that the latest SDL doesn't use directx by default anymore on windows. For better compatibility with newer machines and the unreleased windows vista.

HairyTroll
05-19-2006, 11:40 AM
Hi,

I'm using SDL_GFX myself with rotations. Can you elaborate on how you got it to use additive blending? Btw - that shot looks like normal alpha blending to me.

Oops. You are correct, that's alpha not additive blending.

-Luke

Robert Cummings
05-19-2006, 12:01 PM
Thanks a lot! Will take a look at that .c file (I'm not coding it in C but might be able to convert with minimal hassle)

ggambett
05-19-2006, 02:01 PM
About additive blending, I believe I sent my blit code to the SDL mailing list.

Robert Cummings
05-20-2006, 03:59 PM
Hi Gabriel - I searched the list but was unable to find your blit code. Could you tell me what terms to search for? Thanks :)

Jason Chong
05-22-2006, 02:41 AM
Another thing to remember is that, as long as you run in windowed mode, the backbuffers will ALWAYS be in system memory.

In DirectX capable machines, this is not true, you can have backbuffers in video memory even in windowed mode, only difference is you don't flip, but blit from the backbuffer to the primary surface, so it's still aided by hardware.

SDL for whatever reasons, decided to always use system memory for the backbuffer when running in windowed mode.

So to achieve equal performance with directx/draw applications, you must run SDL in full screen in order to make use of video memory backbuffers.

Robert Cummings
05-22-2006, 03:07 AM
Great tip - thanks! I have the latest SDL which I believe defaults to GDI though - or some such, thanks to vista compatiblity?

Jason Chong
05-22-2006, 06:19 AM
Try using fullscreen and check the surface settings to see if it's using video or system memory surfaces.

As long as you're running in a window, everything is in software mode even if you forced a video_memory flag while creating a surface.

Only in fullscreen will SDL use any hardware acceleration.

I've a thread on this early this year.

http://forums.indiegamer.com/showthread.php?t=5723

Robert Cummings
05-22-2006, 06:45 AM
Ah yes - sorry for the confusion - I actually want software mode to be as fast as possible as I have my own hardware renderer. This is for pcs without acceleration :)

Any more tips on software SDL speed ups?

Jason Chong
05-22-2006, 07:17 AM
Aligned blits? Especially for the destination buffer.

During the older days when I code in asm, i make sure EDI is dword aligned during a rep movsd.

Not sure if VC++ or Mingw will optimize the strcpy function, but for RLE sprites, it helped boosted performance by tons.

Check the SDL source and see how it implements RLE sprites blitting, and disassemble and see if they used strcpy functions for copying the bytes and ensure EDI is double word aligned.

If you have a large amount of data to blit, maybe can consider some of those FPU or SSE or MMX instructions for blitting large chunks of data. :D

Check out some samples here at Paul Hsieh's site
http://www.azillionmonkeys.com/qed/asmexample.html

illume
05-22-2006, 03:47 PM
SDL uses mmx, 3dnow for blitting and mixing when they are available. The latest .10 release has been optimized even more from the previous releases with mmx/3dnow. SDL also uses altivec instructions on ppc machines.

Jason Chong
05-23-2006, 04:17 AM
SDL uses mmx, 3dnow for blitting and mixing when they are available. The latest .10 release has been optimized even more from the previous releases with mmx/3dnow. SDL also uses altivec instructions on ppc machines.


Pointless. I just tested 1.2.10

It is slower than previous version, because it no longer allows SDL_HWSURFACE flags to be set.

The same sample I used which allowed SDL_HWSURFACE in fullscreen, will always use SDL_SWSURFACE even with the same parameters I typed after upgrade to latest sdl.dll for 1.2.10, even after recompiling with the latest libs and headers.

testblitspeed --srchwsurface --dsthwsurface


Is there anyone I can report this bug ? What's the point of using libSDL if it does not accelerate at all even in full screen.

EDIT: never mind, did a putenv to change default windib driver to directx. Back to good speed.

Robert Cummings
05-23-2006, 05:23 AM
isn't windib there for a reason? for example, vista compatibility?

Jason Chong
05-23-2006, 06:35 AM
isn't windib there for a reason? for example, vista compatibility?

from the latest documentation for 1.2.10


The "windib" video driver is the default now, to prevent problems with certain laptops, 64-bit Windows, and Windows Vista. The DirectX driver is still available, and can be selected by setting the environment variable SDL_VIDEODRIVER to "directx".


I don't know of what problems will occur, but here's the code I did to switch it back to directx.

putenv("SDL_VIDEODRIVER=directx");


It has to be called before SDL_Init() or else it won't work.

Previous versions of SDL defaults to directx.

Robert Cummings
05-23-2006, 06:58 AM
Aha thanks! I'll try that. The thing about laptops disturbs me though, so I think I'll ship it with windib to reach the most people.

illume
05-23-2006, 05:29 PM
I think it is mainly with a series of damned intel integrated chipsets. Those are the only reports I've had from the .8 .9 series of SDL.

A driver update fixes it thankfully. So hopefully it won't be long until lots of these win XP machines get auto updated.

Again, damn intel. I think SDL was too hasty in switching to windb. Although it is mostly more compatible, which is SDLs aim.

Robert Cummings
05-24-2006, 03:22 AM
Also I had some pretty disturbing findings last night with new .10.

* I notice that converting your surfaces to the display format resulted in a HUGE performance drop. Why would that be?
* Also using SetAlpha before drawing (a tip from gabriel gambetta) resulted in a further speed drop...

Why on earth is this occuring?

PeterM
05-24-2006, 03:42 AM
Have you tried palettised surfaces?

(I'll get my coat...)

Gilzu
05-24-2006, 03:55 AM
Also I had some pretty disturbing findings last night with new .10.

* I notice that converting your surfaces to the display format resulted in a HUGE performance drop. Why would that be?
* Also using SetAlpha before drawing (a tip from gabriel gambetta) resulted in a further speed drop...

Why on earth is this occuring?

Think that everytime you blit the surface, a new temporary destination-format-compatible surface is created, each pixel is translated to this new surface and only then the blit takes place with the temp surface to the destination surface. Memory&Time inefficiency :( .

mahlzeit
05-24-2006, 04:03 AM
Think that everytime you blit the surface, a new temporary destination-format-compatible surface is created,
Obviously you do the conversion once after loading, not for every blit... Not sure why the new version would slow this down, though. Of course, you can always revert to an older version: they still work fine.

Robert Cummings
05-24-2006, 04:42 AM
I do the conversion when I load the files. I found that it dramatically slowed things down. I don't want to use an earlier version but am hoping to find out the reason why :)

Posted to sdl mailing list but sometimes you'll go several days without a reply...

Gilzu
05-24-2006, 06:39 AM
Obviously you do the conversion once after loading, not for every blit... Not sure why the new version would slow this down, though. Of course, you can always revert to an older version: they still work fine.

I was describing the process taking place once you call blitting to a different format surface. Obvoiusly its best getting EVERY calculation you can outside the main loop.

@Robert Cummings:
I use this code, i think ive posted it before, but since it might help you sort your problem... i'd thought id post it again:

SDL_Surface *LoadSpriteDF(const char *FileName)
{
SDL_Surface *tempSurface;
SDL_Surface *AnotherSurface;

tempSurface = SDL_LoadBMP(FileName);
if (tempSurface == NULL) {
fprintf(stderr, "|%s,%d| Error Loading %s: %s\n", __FILE__,__LINE__,FileName, SDL_GetError());
return NULL;
}
SDL_SetColorKey(tempSurface,SDL_SRCCOLORKEY|SDL_RL EACCEL,SDL_MapRGB(tempSurface->format,255,128,0));
AnotherSurface = SDL_DisplayFormat(tempSurface);
SDL_FreeSurface(tempSurface);
if (AnotherSurface == NULL) {
fprintf(stderr, "|%s,%d| Error Loading %s: %s\n", __FILE__,__LINE__,FileName, SDL_GetError());
return NULL;
}
return AnotherSurface;
}

Spaceman Spiff
05-24-2006, 09:36 AM
Robert,
A couple more items to ponder:

Alpha blending to a surface - If you are doing any significant amount blending per frame, make sure the surface is in System memory, not Video memory. Reading pixels from video memory surfaces can be vastly slower than reading from a system memory surface. It doesn't take that many reads to where the cost of copying/blitting the finished surface to video memory is cheaper than the reads from video memory would be.

Also, remember it's not just about Alpha blending -- any reads by the CPU from a video memory surface apply.

Also, if you have rolled your own routines, you can get a big speedup by using cache prefetch instructions. They are a bit tricky (got to use them in the right amount, and the right temporal distance), and only available on later processors (Pentium 3 & Up, Athlon & up, and the opcodes may not have been unified until the Athlon XP - gotta check that). When you get them right though... WOW..

And they are not just for reading pixels from the surface. If all you are doing is drawing/writing, the way the CPU works, it still pulls in the whole cache line on the first write to an address in that line, because the CPU has no way to know if you are going to fill the whole line or not. (Well, on the PPC and Xbox360 you can specify that and skip the cache line read...).

Finally, if you are drawing a bunch of irregularly shaped sprites/whatever, you can see a big improvement by switching from dirty rectangles to dirty line strips. It's more complicated to manage, but can make a big difference. Either system is also very vulnerable to cache issues - making your dirty XYZ manager cache line friendly is very important.

For what it's worth, I implemented all these things, and more, in my 2d commercial games, and their graphic performance was the best. (Wish the same could be said for AI perf...)

Robert Cummings
05-24-2006, 10:31 AM
Wow thanks guys! Will check out your suggestions. Gilzu - are you certain you've tried that with the new .10 release? It is chopping my framerate in half when I do....

Spaceman - I am not sure what you mean by dirty line strips? It sounds intriguing...

Gilzu
05-24-2006, 11:23 AM
Wow thanks guys! Will check out your suggestions. Gilzu - are you certain you've tried that with the new .10 release? It is chopping my framerate in half when I do....

nope, but i've just installed it, recompiled it - and no change in framerate (100+-).

You might want to check the following:
1. I don't use alpha blending - could it be that this is the cause for the low framerate?
2. check for absolutly *every* surface - should all be in the same memory, and in the same format.
3. Are you releasing every surface?
4. if all of the above doesn't help, i'd go with elimination - /*wrap segments of code and see what exactly causes the problem. */. I used to have a similiar problem, which was caused by my personal-made font unit (!)

and as a side notice - finally they implemented:
Pressing ALT-F4 now delivers an SDL_QUIT event to SDL applications.

mahlzeit
05-24-2006, 11:26 AM
I would link with the previous SDL release and see if it's still so slow. Maybe you're looking in the wrong place for the cause of the problem.

Robert Cummings
05-24-2006, 11:31 AM
Hiya,

I tried the previous .9 SDL release and I found it to be even slower. .10 is actually optimised.

I've checked every single surface and no matter what I do, its faster when I don't optimise! This is true for the p2-400 with S3, and the x1900XT on the Dual core pc.

All surfaces however are alpha blended, apart from the background which is solid blended. I am not sure if I got SolidBlend correct, I think it is SDL_SetColorKey( surface , SDL_SRCCOLORKEY|SDL_RLEACCEL , 0 ) with SDL_SetAlpha( surf , SDL_SRCALPHA | SDL_RLEACCEL , 255 ) ?

In any case - why not try for yourselves - I am finding surfaces with alpha information rendering way faster without optimizing (converting to display type).

Spaceman Spiff
05-24-2006, 12:08 PM
Robert,
I'm not sure of the exact name for it. Basically, it is for updating surfaces with lots of irregularly shaped sprites/whatever. Instead of keeping a list of rectangular areas to update/invalidate, you keep a list of horizontal spans for each line of pixels on the surface. Let's call it "span lists".

To make this useful, you need to have the horizontal bounds of each line of each item drawn.

Imagine you have a tree sprite with a tall trunk and wide top. The rest is transparent/ not drawn. I'd draw an ASCII representation, but this forum editor removes multiple spaces/indents. Now, draw a bounding box around it. Ok, on either side of the trunk is a whole lot of 'dead' space where nothing is going to be drawn, but is going to be updated / redrawn / invalidated / whatever anyway.

So instead of a simple bounding box, you record the x offset of the first and last pixel drawn on each line. For the top of the tree, this would represent a span width nearly the width of the bounding box. For the trunk, it would be just a tiny, narrow strip.

Do this for everything and you have a more accurate picture of which pixels on the surface are going to be updated (holes in objects excepted) with a smaller total area/ number of pixels than using simple bounding boxes / dirty rectangles.

This alone can be useful, but you can take it a step further if you feel like writing the drawing routines. What you do is generate the span lists in a pre-pass (keep info on every item drawn in the previous frame. If it unchanged this frame, don't add it to the span lists). Then you draw *everything* clipped against the span lists. That is, the span lists represent the valid update area. Really -- this works. If you have a large visual that is unchanged, but overlaps a tiny changed area, redrawing just that tiny portion of it can be a huge performance savings.

Ultimately, this all depends on the type of visuals you are drawing. If everything is rectangular, then this won't help much. If you have plenty of irregular shapes, it can help a lot. Clipping partially involved images can help a lot in both approaches.

mahlzeit
05-24-2006, 01:19 PM
I am not sure if I got SolidBlend correct
EDIT: What I said here doesn't make any sense. Maybe try without the RLE flag, though.