template<class BoostedCpp>

A personal blog about C++ programming and Boost libraries

Inside Boost.Container: comparing different deque implementations

Many C++ developers have used std::deque, and most assume all implementations are roughly equivalent. In practice, they differ quite a bit. The three major standard library implementations — libc++ (Clang/LLVM), libstdc++ (GCC), and Microsoft STL (Visual Studio) — each make different design choices about block sizes, iterator layout, map management, and memory reclamation. Boost.Container 1.90 adds a fourth option with a recently rewritten implementation that improves performance and reduces significantly the size of an empty deque.

The common architecture

All four implementations share the same high-level idea: a map (an array of pointers) pointing to fixed-size blocks, each holding a contiguous chunk of elements. This two-level indirection is what gives deque its characteristic properties — O(1) push at both ends without invalidating references. The differences are in how each library manages those blocks.

Internal data structure diagrams

The key differences lie in how each implementation stores its metadata: what the map looks like, how iterators track their position, and what block size strategy they use. The diagrams below show the internal layout of each implementation for 64-bit systems.

libstdc++ (GCC)

libstdc++ (GCC) Block size: 512 bytes / sizeof(T), min 1 · Map: centered counted array _M_map (T**) null null blk0 blk1 blk2 null null null _M_map_size = 8 Blocks centered in map [ elem₀ | elem₁ | … | elemₙ ] 512 / sizeof(T) elements Iterator (4 pointers = 32 bytes): _M_cur _M_first _M_last _M_node Members: _M_map, _M_map_size, _M_start (iter), _M_finish (iter)

libstdc++ stores the valid range as a pair of fat iterators. Blocks are allocated centered in the map; when one side fills up, in-use blocks are slid to re-center.

libc++ (Clang/LLVM)

libc++ (Clang/LLVM) Block size: 4096 bytes / sizeof(T), min 16 · Map: __split_buffer (a deque-of-pointers) __map (__split_buffer<T*>) __first_ / __begin_ / __end_ / __end_cap_ blk0 blk1 blk2 spare Map itself is a mini-deque Keeps 1 spare block per end 4096 / sizeof(T) elements per block Iterator (2 pointers = 16 bytes): __ptr_ (T*) __m_iter_ (block**) Members: __map_ (split_buffer), __start_ (offset), size_

libc++ uses a __split_buffer for the map itself — effectively a deque of block pointers. Iterators are lean (2 pointers). The valid range is tracked by start offset + size.

MSVC STL (Microsoft)

MSVC STL (Microsoft) Block size: max(16 / sizeof(T), 1) · Map: circular counted array · Keeps ALL spare blocks _Map (circular buffer of T**) spare blk0 blk1 blk2 spare spare Wraps around _Mapsize = 8, _Myoff = start 16-byte blocks (quite small) 16 bytes per block e.g. 4 ints, 2 longs, or 1 large struct Iterator (proxy: deque* + index = 16 bytes): _Myproxy (deque*) _Myoff (size_t) Members: _Map, _Mapsize, _Myoff, _Mysize (+proxy)

MSVC treats the map as a circular buffer — no sliding needed. The 16-byte block size is quite small compared to other implementations, and is a legacy decision the team has said they plan to revisit at the next ABI break. Iterators are proxies back to the parent deque.

Boost.Container 1.90 deque (rewritten)

Boost.Container 1.90 deque (rewritten) Block size: 1024 bytes (64-bit) / 512 (32-bit), configurable · Map: center + slide · sizeof = 32 bytes m_map (ptr_alloc_ptr) — double-ended offsets, no fat iterators null blk0 blk1 blk2 null Configurable via deque_options<> block_size<N>, block_bytes<N> 1024-byte blocks = same range as libc++ Deque members (4 words = 32 bytes on 64-bit): m_map (T**) m_map_size m_start_off m_finish_off Iterator (2 pointers = 16 bytes — same as libc++/MSVC): cur (T*) node (block**) Previously 80 bytes with fat iterators — now 32 bytes, using double-ended offsets into the block map.

Boost was rewritten for version 1.90. The previous SGI-derived layout (10 words, fat iterators) was replaced with a 4-word structure using double-ended offsets (m_start_off / m_finish_off) into the block map, and 2-pointer iterators. The default block size was also increased to 1024 bytes on 64-bit.

sizeof(deque<int>) and dynamic allocation comparison

The in-memory footprint of an empty deque object varies quite a bit across implementations. On a 64-bit platform:

Implementation sizeof (bytes) Breakdown
Boost 32 m_map (8) + m_map_size (8) + m_start_off (8) + m_finish_off (8) = 32
MSVC STL 40 map ptr (8) + map_size (8) + offset (8) + size (8) + proxy (8) = 40
libc++ 48 __split_buffer (4 ptrs = 32) + start offset (8) + size (8) = 48
libstdc++ 80 map ptr (8) + map_size (8) + 2 iterators × 4 ptrs × 8 = 64 → total 80

Boost’s reimplemented deque comes in at 32 bytes — the smallest among the four implementations tested. The previous Boost layout was 80 bytes, the same as libstdc++. The reduction comes from replacing the two stored fat iterators with a pair of offsets (m_start_off / m_finish_off) that index into the block map.

Apart from this static size, implementations differ whether dynamic allocation is required when default constructing or move constructing a deque. It’s important to note that operations that allocate memory can’t be noexcept.

Implementation Dynamic memory allocation and noexcept-ness on
default or move construction
Boost No allocation. noexcept constructors.
MSVC STL 16 byte allocation if _ITERATOR_DEBUG_LEVEL ==2 (typically in Debug).
Constructors are not marked as noexcept
libc++ No allocation. noexcept constructors.
libstdc++

One block (512 bytes) + minimal map (64 bytes). Constructors are not marked as noexcept

Having noexcept default and move constructors can be important for those users wanting to replace vector members with deques. Implementations having potentially throwing constructors could require code changes (default and move constructors for user-defined types using deques could not longer be noexcept).

Key design trade-offs

Property libstdc++ libc++ MSVC Boost
Block bytes 512 4096 16 1024 (64-bit)
512 (32-bit)*
Initial map size 8 2 8 8
Map strategy Center + slide split_buffer Circular buffer Center + slide
Spare block policy Free immediately Keep 1 per end Keep all Free immediately
noexcept default/
move constructor
No Yes No Yes
Dynamic allocation on
default/move constructor
One block (512B)
+ minimal map (64B)
No When debug iterators are
enabled (e.g. Debug builds)
No
Iterator size 32 bytes 16 bytes 16 bytes 16 bytes
sizeof(deque) 80 48 40 32
Configurable No No No Yes

* Boost defaults; configurable via block_size<N> and block_bytes<N> deque options.

Benchmarks

The following deque implementations were measured:

  • libstdc++ 14.2 (GCC 14)
  • libc++ 18.1 (Clang 18)
  • Microsoft STL shipped with MSVC 19.40 (VS 2022 17.10)
  • Boost was compiled with GCC 14 in three block-size configurations (512B, 1K, 4K).

Tested were executed on x86-64 systems with deque<int>at N = 100,000 elements:

  • The range-insert tests use d.insert(pos, src.begin(), src.end()) with a pre-populated 128 element size source vector into a pre-populated 100K-element deque;
  • The “at back” variant inserts at d.end()
  • The “at position ¾” variant inserts at d.begin() + d.size()*3/4 128 elements, requiring the implementation to shift about 25K elements.

Notes:

  • Times are in nanoseconds per operation (or per inserted element where noted), lower is better.
  • Std implementation bars are ordered by block size –> smallest(MSVC, 16Byte) to biggest(4KByte, libc++).
  • Boost deque is tested with 3 block sizes (512B, 1KB (default), 4KB)
push_back — ns per operation 0 5 10 14 19 24 MSVC 24.1 ns libstdc++ 8.2 ns libc++ 6.5 ns Boost (512B) 7.6 ns Boost (1K) 6.9 ns Boost (4K) 6.3 ns
Sequential iteration (sum all elements) — ns per element 0.0 0.8 1.6 2.3 3.1 3.9 MSVC 3.9 ns libstdc++ 1.1 ns libc++ 0.80 ns Boost (512B) 1.0 ns Boost (1K) 0.90 ns Boost (4K) 0.70 ns
Random access (operator[]) — ns per access 0.0 1.0 2.1 3.1 4.2 5.2 MSVC 5.2 ns libstdc++ 3.4 ns libc++ 2.8 ns Boost (512B) 3.1 ns Boost (1K) 2.9 ns Boost (4K) 2.6 ns
push_front / pop_front churn — ns per operation pair 0 6 13 19 25 32 MSVC 31.6 ns libstdc++ 12.4 ns libc++ 9.8 ns Boost (512B) 11.6 ns Boost (1K) 10.5 ns Boost (4K) 9.2 ns
Range insert 128 at back — ns per inserted element 0.0 1.1 2.2 3.3 4.4 5.5 MSVC 5.5 ns libstdc++ 1.1 ns libc++ 0.53 ns Boost (512B) 0.90 ns Boost (1K) 0.74 ns Boost (4K) 0.48 ns
Range insert 128 at position ¾ — ns per inserted element 0 10 19 29 39 48 MSVC 48.4 ns libstdc++ 10.2 ns libc++ 5.9 ns Boost (512B) 8.8 ns Boost (1K) 7.1 ns Boost (4K) 5.5 ns

Analysis

Block size appears to be the most significant factor in these results. The three Boost configurations help illustrate this — as block size increases from 512B to 1K to 4K, performance improves consistently across all benchmarks. Note that a big block size increases the memory consumption of a deque with very few elements, big block sizes have also downsides. Generally speaking, deque is not a good container when the most common case is having a few elements in the container.

Comparing libstdc++ and Boost (512B) is useful because both use the same 512-byte blocks (128 ints each), which isolates the effect of the different internal algorithms. Boost shows a slight advantage here and the difference likely comes from the newly reimplemented segmented algorithms and (maybe) the smaller 32-byte deque object (vs libstdc++’s 80 bytes), which could mean fewer cache lines are touched when the deque metadata is accessed.

Range insertion at the back is where block size differences show up most clearly. MSVC’s per-element cost reflects the overhead of crossing 32 block boundaries (128 ints ÷ 4 ints per block) for every insert call. Boost (4K) and libc++, with their large blocks, can often absorb 128 ints (512 bytes) into a single contiguous region, bringing the cost down.

The near-back insertion benchmark (position ¾) adds a new dimension: the cost of shifting existing elements. When inserting at position ¾ of a 100K-element deque, implementations need to move roughly 25,000 elements (the shorter back quarter) to make room. This shift dominates the cost and is also sensitive to block size — larger blocks allow the shift to proceed as contiguous memmove operations within blocks rather than element-by-element across block boundaries.

It’s worth noting that MSVC’s circular map design has the advantage of avoiding block sliding, but the small block size leads to frequent cache misses and allocation overhead in bulk operations. The MSVC team has noted this is something they plan to address at the next ABI break.

Takeaways

Among the implementations tested, this shows that the object size reduction achieved by Boost has been compatible with an improved perfomance. When configured with 512-byte blocks to match libstdc++’s geometry, it shows pretty good performance whereas with the 4KB block option ( deque_options<block_bytes<4096>>), its throughput is comparable to libc++’s.

The main takeaway is that std::deque is not a single data structure — these four implementations make substantially different trade-offs, and the choice of block size alone can account for large performance differences in iteration and bulk insertion. It’s worth understanding which implementation you’re using and what its defaults are, especially if deque performance is relevant to your workload.

If you need a portable high-performance deque implementation that offers noexcept guarantees for essential constructors or you want to take advantage of a configurable block size, take a look at Boost.Container’s new deque implementation.

Responses

  1. Howard Hinnant Avatar

    Nice write up! Love the comparisons.

    I’d love to see a “queue benchmark” comparison. That is, how does std::queue (uses deque under the covers) perform under a series of pushes and pops that keeps the average size() about the same over a time?

    This might be done by alternating push and pop. Or it might be done by randomly calling push and pop with equal probability.

    This benchmark is designed to highlight a potential strong point of deque. deque should be able to pull off this use pattern better than other std::containers.

  2. Ion Gaztañaga Avatar

    Thanks Howard!

    Certainly std::queue use cases would be interesting to benchmark. I expect the libc++ implementation will shine on some of them, as it keeps a spare block on each end, avoiding allocator calls in some cases.

    I also noticed that I should add one additional data to the comparison table regarding the noexceptness of the default constructor, as it can make a difference if users want to achieve noexcept default constructors on types holding deques.

    1. Howard Hinnant Avatar

      Here is a comparison I made about a decade ago on the noexcept issue. I don’t know if it is still accurate today or not.

      https://howardhinnant.github.io/container_summary.html

  3. Ion Gaztañaga Avatar

    Thanks Howard, I’ll update the article adding the noexcept info, it’s certainly relevant.

  4. Ion Gaztañaga Avatar

    Hi, I reviewed a bit the default and move constructor guarantees for existing implementations (let’s hope there are no mistakes) and updated the article accordingly. Thanks again Howard!

Leave a Reply

Discover more from template<class BoostedCpp>

Subscribe now to keep reading and get access to the full archive.

Continue reading