Recently I’ve been writing an implementation of OpenSHMEM for the SiCortex platform. OpenSHMEM is a communications API that lets you write PGAS programs in C or FORTRAN. PGAS stands for “partitioned global address space.” A PGAS program is a parallel program that can run on a lot of cores or cluster nodes simultaneously. Each processing element (PE) can read and write the address spaces of the other PEs, but the program “knows” that the global address space is partitioned into a bunch of local address spaces. This is either taken care of by the language, like in UPC (unified parallel C) or by the programmer explicitly using something like OpenSHMEM.
All this is just introduction. OpenSHMEM derives from the Cray/SGI SHMEM API, and one of the things it has are a lot of API calls that differ only in the datatypes of their arguments.
For example, OpenSHMEM has
extern void shmem_short_wait(short *var, short value);
extern void shmem_int_wait(int *var, int value);
extern void shmem_long_wait(long *var, long value);
extern void shmem_longlong_wait(longlong *var, longlong value);
all of which wait for a local variable to be set by some remote PE.
I am a lazy programmer. I should write the same routine four times? Instead, I used some C preprocessor magic and wrote this:
#define EmitWait(type)
void shmem_##type##_wait(type *var, type value)
{
while (*((volatile type *) var) == value) shmem_progress();
}
EmitWait(short);
EmitWait(int);
EmitWait(long);
EmitWait(longlong);
This uses the “token pasting” feature of CPP to write the proper function names for the different versions of the routine.
A bit later, I learned of the Global Address Space Performance initiative, a project at the University of Florida. By putting the performance analyzer library in front of the OpenSHMEM library on your search path, you can instrument the communications functions in your program without recompilation. This works via dynamic linking. The program calls shmem_long_wait, which is intercepted by the performance library. The library does whatever it does, then passes the call to pshmem_long_wait, which is provided by the OpenSHMEM implementation.
You can do this by providing two OpenSHMEM implementations, one with the names like shmem_long_wait, which is used when not profiling, and one with names like pshmem_long_wait, which is used when you are. Alternatively, you can use the “weak symbol” feature of the GNU runtime. A weak symbol defines something, but it doesn’t complain if an alternative definition is present in the address space. To make this work, you write all the functions as pshmem_long_wait, then add a weak symbol definition for the standard versions, like this:
#pragma “weak shmem_long_wait=pshmem_long_wait”
Now everything is in one library, and there is no performance penalty when you aren’t using the instrumentation library.
Well the obvious way for the lazy programmer to do this is like this:
#define EmitWait(type)
#pragma “weak shmem_##type##_wait=pshmem_##type##_wait”
void shmem_##type##_wait(type *var, type value)
{
while (*((volatile type *) var) == value) shmem_progress();
}
but this fails because you can’t use the C preprocessor to write C preprocessor items like #pragma. No problem! C99 provides an alternate version of pragma exactly for this reason. The GNU info file says:
C99 introduces the `_Pragma’ operator. This feature addresses a major problem with `#pragma’: being a directive, it cannot be produced as the result of macro expansion. `_Pragma’ is an operator, much like `sizeof’ or `defined’, and can be embedded in a macro.
Now I can write my macro with
_Pragma(“weak shmem_##type##_wait=pshmem_##type##_wait”)
right? Well, no. token pasting doesn’t work inside strings! You can’t build up string constants this way. No problem! GCC automatically concatenates adjacent string constants into a single longer string. This was originally done so you can avoid line wrapping, but whatever. I can write this
_Pragma(“weak shmem_” #type “_wait=pshmem_” #type “_wait”)
This is using a different preprocessor feature, called “stringification” in which #type is expanded and turned into a string constant.
Unfortunately, this doesn’t work either, because _Pragma is processed earlier in the compiler than other uses of string constants, and before the string concatenation happens. _Pragma has to have exactly one string constant as an argument.
How much time have I spent on this? How many cases of _Pragma do I have to write by hand? I give up. The final version is
#define EmitWait(type)
void pshmem_##type##_wait(type *var, type value)
{
while (*((volatile type *) var) == value) shmem_progress();
}
_Pragma(“weak shmem_short_wait=pshmem_short_wait”)
_Pragma(“weak shmem_int_wait=pshmem_int_wait”)
_Pragma(“weak shmem_long_wait=pshmem_long_wait”)
_Pragma(“weak shmem_longlong_wait=pshmem_longlong_wait”)
_Pragma(“weak shmem_wait=pshmem_wait”)
EmitWait(short);
EmitWait(int);
EmitWait(long);
EmitWait(longlong);
It has more typing than ought to be necessary, but I got over it.