Using Atomic Operations in C on Solaris 10

Atomic instructions ensure global visibility of atomically-modified variables on completion. In a relaxed store order system, this does not guarantee that the visibility of other variables will be synchronized with the completion of the atomic instruction. If such synchronization is required, memory barrier instructions must be used.

Atomic instructions (like atomic_add_int(3C)) can be expensive, since they require synchronization to occur at a hardware level. This means they should be used with care to ensure that forcing hardware level synchronization occurs a minimum number of times. For example, if you have several variables that need to be incremented as a group, and each needs to be done atomically, then do so with a mutex lock protecting all of them being incremented rather than using the atomic_inc(3C) operation on each of them.

Solaris 10 comes with a complete set of functions for atomic instructions:

  • atomic_add(3C) – enables the addition of delta to the value stored in target to occur in an atomic manner.
  • atomic_and(3C) – enables the the bitwise AND of bits to the value stored in target to occur in an atomic manner.
  • atomic_bits(3C) – perform an exclusive atomic bit set or clear operation.
  • atomic_cas(3C) – enables a compare and swap operation to occur atomically.
  • atomic_dec(3C) – enables the decrementing (by one) of the value stored in target to occur in an atomic manner.
  • atomic_inc(3C) – enables the inrementing (by one) of the value stored in target to occur in an atomic manner.
  • atomic_or(3C) – enables the the bitwise OR of bits to the value stored in target to occur in an atomic manner.
  • atomic_swap(3C) – enables a swap operation to occur atomically.

An atomic operation must complete without anything else being able to change the variable during the operation. It looks like a single step and is extremely useful when working with data being shared among threads.

Quick Example

In the following example I’ll show how to use these functions with atomic_add_int as reference.

sol10 ~ % cat atomic_add_int_test.c
#include <atomic.h>

volatile unsigned int test;

void main(int argc, char **argv) {
    int i;
    test = 0;
    for (i=0; i<1000; i++) {
        atomic_add_int(&test, i);

sol10 ~ % cc -O atomic_add_int_test.c -o atest
sol10 ~ % ./atest

Here’s the disassembly for atomic_add_int from libc:

sol10 ~ % uname -a
SunOS s10box 5.10 s10_60 sun4u sparc SUNW,Ultra-60
sol10 ~ % dis -F _atomic_add_int /usr/lib/

disassembly for /usr/lib/

section .text
	2dc18:    ld        	[%o0], %o2
	2dc1c:    add       	%o2, %o1, %o3  <---+ [perform operation]
	2dc20:    cas       	[%o0] , %o2, %o3   | [compare & swap]
	2dc24:    cmp       	%o2, %o3           | [see if it worked]
	2dc28:    bne,a,pn  	%icc, 0x2dc1c      | [branch if it did not work]
	2dc2c:    mov       	%o3, %o2       <---+ [delay slot]
	2dc30:    retl      	
	2dc34:    add       	%o2, %o1, %o0

The variable is loaded using the ld instruction, the add is performed and the value stored back using the cas instruction (compare and swap). Now it is checked whether the operation was successful or not. If not the operation is repeated on behalf of the bne instruction (branch on not equal).

The cas instruction performs the test that if the value held at [%o0] is equal to the value %o2, then replace it with the value %o3. %o3 returns the value that was in [%o0] when the operation was tried. The instruction in the delay slot of the branch gets executed together with the branch instruction.

It is possible to put instructions in the delay slot of branches – this can be useful if you wish to use the processor support for annulled instructions – but doing so will cause the code to be late-inlined, and may result in sub-optimal performance.

Leave a Reply