#define MYMAX( a, b, c ) ( c = ( a & ~( ( a - b ) >> 31 ) ) | ( b &  ( ( a - b ) >> 31 ) ) )

An interesting thing that can be done is this:

#define MYMAX( a, b, c, d, e, f ) ( c = ( a & ~( ( a - b ) >> 31 ) ) | ( b &  ( ( a - b ) >> 31 ) ); f = ( d & ~( ( a - b ) >> 31 ) ) | ( e &  ( ( a - b ) >> 31 ) )  )

The above macro returns the maximum of a,b in c but also returns one of d or e in f based on the max of a,b. Handy for setting setting flags based on the result of the operation.

YES, the macro could just return a value and not store it in c. I did it this way because I was doing some other stuff to see if I could make it faster and left it this way.

I can’t guarantee that I go the a – b part right and it might be b – a to work properly. I worked on code with this today and six of these was about 5% to 10% faster than code with if() tests. My code does something like this:

int a = somevalue1;
int b = somevalue2;
int x = somevalue3;
int y = somevalue4;

int Temp1;
int Temp2;

MYMAX( a, b, Temp1 );
MYMAX( x, y, Temp2 );

int Result;
MYMAX( Temp1, Temp2, Result );

It was faster but not by much. Now if I could just get it into some SSE instructions and have it go faster, I’d be a hero. Too bad the SSE2 max instruction only works on 8 and 16 bit values and not 32 bit values. Of course I’m using integers, not floating point. I tried it with the SSE2 intrinsic functions but there was too much work involved in initializing the SSE data types from the integers used it he rest of the code. I’ll have to revisit this and try to keep things in the SSE data types for the whole algorithm and see if that’s workable.