Sunday 29 January 2012

Triple Buffering as a Concurrency Mechanism

Revised (less buggy) implementation in part II!

Triple Buffering is a way of passing data between a producer and a consumer running at different rates. It ensures that the consumer only sees complete data with minimal lag, but the consumer doesn't expect to see every piece of data. The technique is most commonly used between images produced by a graphics card and a monitor; the graphics card is free to render hundreds of frames per second while the monitor consumes a fixed 60 fps.

This technique is also applicable as a small-scale lock-free concurrency mechanism; many applications consume real-time data but want to operate on fixed snapshots, or alternatively the data-processing operation performed takes longer than the time between each new piece of input data (and missing input data is acceptable). The following C macros declare a generic triple-buffer data structure:

#define TRIPLE_BUFFER_TYPE(TYPENAME, TYPE) \
  struct TYPENAME { \
    TYPE* dirty; \
    TYPE* clean; \
    TYPE* snap; \
    TYPE buffer[3]; \
  }
#define TRIPLE_BUFFER_NEW(NAME,TYPENAME) \
  struct TYPENAME NAME; \
  NAME.dirty = &NAME.buffer[0]; \
  NAME.clean = &NAME.buffer[1]; \
  NAME.snap  = &NAME.buffer[2];

I've made the type declaration a macro so the buffer member of the struct will be the correct size - C++ implementations should use templates. If the producer and consumer are on different threads it makes it less obvious whether I should put the three buffers next to each other in memory. If the three buffers are on the same cache line when the writer writes to one, then this may invalidate the buffers in the cache of the reader's core unnecesarily. The writer needs to access the "dirty" buffer and the reader needs to access the "snap" buffer:

#define TRIPLE_BUFFER_SNAP_PTR(BUFFER) BUFFER.snap
#define TRIPLE_BUFFER_WRITE_PTR(BUFFER) BUFFER.dirty

The "dirty" pointer points to the buffer that the writer is currently writing to, and the "clean" pointer is the buffer with the most recent complete set of data. When the writer has finished modifying the dirty buffer it flips the two pointers with the following macro:

#ifdef _GLIBCXX_ATOMIC_BUILTINS
  #define TRIPLE_BUFFER_FLIP_WRITER(BUFFER) \
    do { \
      void* tmp = BUFFER.clean; \
      while(!__sync_bool_compare_and_swap( \
       &BUFFER.clean, \
       BUFFER.clean, \
       BUFFER.dirty)){}; \
      while(!__sync_bool_compare_and_swap( \
        &BUFFER.dirty, \
        BUFFER.dirty, \
        tmp)){}; \
    } while(0)
#else
  #error no atomic operations available!
#endif

This macro is lock-free (it uses GCC built-ins - it should be changed to use C11 atomic operations when compiler support for them is mature enough). The order of the two operations is important as it guarantees that the clean and snap points are complete, immutable sets of data. The clean pointer is set to the dirty pointer first (as this macro has been called it is assumed that the dirty pointer points to a now-complete set of data). The dirty pointer is then updated - but as the clean pointer is the one that is snapped by the reader only its consistency is important. Note that this macro doesn't memset or otherwise clear the new dirty buffer before returning, so writers should do this themselves if they only intend to write to parts of the buffer.

#ifdef _GLIBCXX_ATOMIC_BUILTINS
  #define TRIPLE_BUFFER_NEW_SNAP(BUFFER) \
    do { \
      void* tmp = BUFFER.clean; \
      while(!__sync_bool_compare_and_swap( \
        &BUFFER.clean, \
        BUFFER.clean, \
        BUFFER.snap)){}; \
      while(!__sync_bool_compare_and_swap( \
        &BUFFER.snap, \ 
        BUFFER.snap, \
        tmp)){}; \
    } while(0)
#else
  #error no atomic operations available!
#endif

The reader should call this macro when they want to take a snapshot of the current clean buffer. As above, this is lock-free and uses GCC built-ins. The order of the two atomic operations is important; switching the clean one first effectively isolates the current clean buffer from the writer, preserving its contents even if the writer continues to write and flip. I assume this method is called by the reader, as between the two atomic operations the snap pointer points to a changing set of data. However, the next atomic operation points the snap to the isolated buffer, and when control flow returns from the macro the snap pointer is pointing at what was the clean buffer when it was called. After calling this macro the clean buffer contains the last snap, so snapping twice without an intervening write-and-flip will just return old data. The triple buffer (in its current form) is therefore only useful if the writer always has higher frequency of writes than the reader has of reads. Also, there's no intrinsic mechanism to detect if a snap has changed. If the reader must know this, then the data must have a sequence number or timestamp (either from its producer or inserted by the writer) which the reader can compare with that of the previous snap.

The below unit test shows a typical use case: a writer streams integers into a triple buffer, and the reader periodically snaps them to do a computation.

TRIPLE_BUFFER_TYPE(TestStruct, int);
TRIPLE_BUFFER_NEW(it, TestStruct);

/* Test 1 */

*TRIPLE_BUFFER_WRITE_PTR(it) = 3;
TRIPLE_BUFFER_FLIP_WRITER(it);

TRIPLE_BUFFER_NEW_SNAP(it);
assert(*TRIPLE_BUFFER_SNAP_PTR(it) == 3);

/* Test 2 */

*TRIPLE_BUFFER_WRITE_PTR(it) = 4;
TRIPLE_BUFFER_FLIP_WRITER(it);
*TRIPLE_BUFFER_WRITE_PTR(it) = 5;
TRIPLE_BUFFER_FLIP_WRITER(it);
*TRIPLE_BUFFER_WRITE_PTR(it) = 6;
TRIPLE_BUFFER_FLIP_WRITER(it);

TRIPLE_BUFFER_NEW_SNAP(it);

*TRIPLE_BUFFER_WRITE_PTR(it) = 7;
TRIPLE_BUFFER_FLIP_WRITER(it);
*TRIPLE_BUFFER_WRITE_PTR(it) = 8;
TRIPLE_BUFFER_FLIP_WRITER(it);

assert(*TRIPLE_BUFFER_SNAP_PTR(it) == 6);

Please let me know if you find a use for this data structure or have any improvements...

Revised (less buggy) implementation in part II!

6 comments:

  1. I recently implemented something similiar…involving database tables. An unbounded number of writers are allowed to insert rows into table A. A periodic process locks table A briefly and does a swap/rename of A to TEMP and B to A and TEMP to B. That process then releases the lock and processes the contents of B.

    ReplyDelete
  2. Do you mean sth like this:

    #define TRIPLE_BUFFER_FLIP_WRITER(BUFFER) \
    BUFFER.dirty = __sync_lock_test_and_set(&BUFFER.clean, BUFFER.dirty);

    #define TRIPLE_BUFFER_NEW_SNAP(BUFFER) \
    BUFFER.snap = __sync_lock_test_and_set(&BUFFER.clean, BUFFER.snap);

    Regards :)

    ReplyDelete
    Replies
    1. That's a better solution that my Part I implementation, but you don't solve the problem I fix in Part II - that is if you call the snap method twice you snap a new value and then snap the old value again...

      Delete
    2. Well, IMHO this not the point of the solution. The mechanism must allow for safe concurrent behaviour and this is provided. Now next layer can be added to provide some identification of the "messages" e.g. some incremented ID inside data.

      Delete
  3. Why to swap the buffers instead of just passing the dirty to the clean one and the clean to the snap?

    ReplyDelete
  4. Thanks for writing this article and the updated bug free version.

    ReplyDelete