Many C++ developers have used std::deque, and most assume all implementations are roughly equivalent. In practice, they differ quite a bit. The three major standard library implementations — libc++ (Clang/LLVM), libstdc++ (GCC), and Microsoft STL (Visual Studio) — each make different design choices about block sizes, iterator layout, map management, and memory reclamation. Boost.Container 1.90 adds a fourth option with a recently rewritten implementation that improves performance and reduces significantly the size of an empty deque.
The common architecture
All four implementations share the same high-level idea: a map (an array of pointers) pointing to fixed-size blocks, each holding a contiguous chunk of elements. This two-level indirection is what gives deque its characteristic properties — O(1) push at both ends without invalidating references. The differences are in how each library manages those blocks.
Internal data structure diagrams
The key differences lie in how each implementation stores its metadata: what the map looks like, how iterators track their position, and what block size strategy they use. The diagrams below show the internal layout of each implementation for 64-bit systems.
libstdc++ (GCC)
libstdc++ stores the valid range as a pair of fat iterators. Blocks are allocated centered in the map; when one side fills up, in-use blocks are slid to re-center.
libc++ (Clang/LLVM)
libc++ uses a __split_buffer for the map itself — effectively a deque of block pointers. Iterators are lean (2 pointers). The valid range is tracked by start offset + size.
MSVC STL (Microsoft)
MSVC treats the map as a circular buffer — no sliding needed. The 16-byte block size is quite small compared to other implementations, and is a legacy decision the team has said they plan to revisit at the next ABI break. Iterators are proxies back to the parent deque.
Boost.Container 1.90 deque (rewritten)
Boost was rewritten for version 1.90. The previous SGI-derived layout (10 words, fat iterators) was replaced with a 4-word structure using double-ended offsets (m_start_off / m_finish_off) into the block map, and 2-pointer iterators. The default block size was also increased to 1024 bytes on 64-bit.
sizeof(deque<int>) and dynamic allocation comparison
The in-memory footprint of an empty deque object varies quite a bit across implementations. On a 64-bit platform:
| Implementation | sizeof (bytes) | Breakdown |
|---|---|---|
| Boost | 32 | m_map (8) + m_map_size (8) + m_start_off (8) + m_finish_off (8) = 32 |
| MSVC STL | 40 | map ptr (8) + map_size (8) + offset (8) + size (8) + proxy (8) = 40 |
| libc++ | 48 | __split_buffer (4 ptrs = 32) + start offset (8) + size (8) = 48 |
| libstdc++ | 80 | map ptr (8) + map_size (8) + 2 iterators × 4 ptrs × 8 = 64 → total 80 |
Boost’s reimplemented deque comes in at 32 bytes — the smallest among the four implementations tested. The previous Boost layout was 80 bytes, the same as libstdc++. The reduction comes from replacing the two stored fat iterators with a pair of offsets (m_start_off / m_finish_off) that index into the block map.
Apart from this static size, implementations differ whether dynamic allocation is required when default constructing or move constructing a deque. It’s important to note that operations that allocate memory can’t be noexcept.
| Implementation | Dynamic memory allocation and noexcept-ness on default or move construction |
|---|---|
| Boost | No allocation. noexcept constructors. |
| MSVC STL | 16 byte allocation if _ITERATOR_DEBUG_LEVEL ==2 (typically in Debug). Constructors are not marked as noexcept. |
| libc++ | No allocation. noexcept constructors. |
| libstdc++ |
One block (512 bytes) + minimal map (64 bytes). Constructors are not marked as noexcept. |
Having noexcept default and move constructors can be important for those users wanting to replace vector members with deques. Implementations having potentially throwing constructors could require code changes (default and move constructors for user-defined types using deques could not longer be noexcept).
Key design trade-offs
| Property | libstdc++ | libc++ | MSVC | Boost |
|---|---|---|---|---|
| Block bytes | 512 | 4096 | 16 | 1024 (64-bit) 512 (32-bit)* |
| Initial map size | 8 | 2 | 8 | 8 |
| Map strategy | Center + slide | split_buffer | Circular buffer | Center + slide |
| Spare block policy | Free immediately | Keep 1 per end | Keep all | Free immediately |
| noexcept default/ move constructor |
No | Yes | No | Yes |
| Dynamic allocation on default/move constructor |
One block (512B) + minimal map (64B) |
No | When debug iterators are enabled (e.g. Debug builds) |
No |
| Iterator size | 32 bytes | 16 bytes | 16 bytes | 16 bytes |
| sizeof(deque) | 80 | 48 | 40 | 32 |
| Configurable | No | No | No | Yes |
* Boost defaults; configurable via block_size<N> and block_bytes<N> deque options.
Benchmarks
The following deque implementations were measured:
- libstdc++ 14.2 (GCC 14)
- libc++ 18.1 (Clang 18)
- Microsoft STL shipped with MSVC 19.40 (VS 2022 17.10)
- Boost was compiled with GCC 14 in three block-size configurations (512B, 1K, 4K).
Tested were executed on x86-64 systems with deque<int>at N = 100,000 elements:
- The range-insert tests use
d.insert(pos, src.begin(), src.end())with a pre-populated 128 element size source vector into a pre-populated 100K-element deque; - The “at back” variant inserts at
d.end() - The “at position ¾” variant inserts at
d.begin() + d.size()*3/4128 elements, requiring the implementation to shift about 25K elements.
Notes:
- Times are in nanoseconds per operation (or per inserted element where noted), lower is better.
- Std implementation bars are ordered by block size –> smallest(MSVC, 16Byte) to biggest(4KByte, libc++).
- Boost deque is tested with 3 block sizes (512B, 1KB (default), 4KB)
Analysis
Block size appears to be the most significant factor in these results. The three Boost configurations help illustrate this — as block size increases from 512B to 1K to 4K, performance improves consistently across all benchmarks. Note that a big block size increases the memory consumption of a deque with very few elements, big block sizes have also downsides. Generally speaking, deque is not a good container when the most common case is having a few elements in the container.
Comparing libstdc++ and Boost (512B) is useful because both use the same 512-byte blocks (128 ints each), which isolates the effect of the different internal algorithms. Boost shows a slight advantage here and the difference likely comes from the newly reimplemented segmented algorithms and (maybe) the smaller 32-byte deque object (vs libstdc++’s 80 bytes), which could mean fewer cache lines are touched when the deque metadata is accessed.
Range insertion at the back is where block size differences show up most clearly. MSVC’s per-element cost reflects the overhead of crossing 32 block boundaries (128 ints ÷ 4 ints per block) for every insert call. Boost (4K) and libc++, with their large blocks, can often absorb 128 ints (512 bytes) into a single contiguous region, bringing the cost down.
The near-back insertion benchmark (position ¾) adds a new dimension: the cost of shifting existing elements. When inserting at position ¾ of a 100K-element deque, implementations need to move roughly 25,000 elements (the shorter back quarter) to make room. This shift dominates the cost and is also sensitive to block size — larger blocks allow the shift to proceed as contiguous memmove operations within blocks rather than element-by-element across block boundaries.
It’s worth noting that MSVC’s circular map design has the advantage of avoiding block sliding, but the small block size leads to frequent cache misses and allocation overhead in bulk operations. The MSVC team has noted this is something they plan to address at the next ABI break.
Takeaways
Among the implementations tested, this shows that the object size reduction achieved by Boost has been compatible with an improved perfomance. When configured with 512-byte blocks to match libstdc++’s geometry, it shows pretty good performance whereas with the 4KB block option ( deque_options<block_bytes<4096>>), its throughput is comparable to libc++’s.
The main takeaway is that std::deque is not a single data structure — these four implementations make substantially different trade-offs, and the choice of block size alone can account for large performance differences in iteration and bulk insertion. It’s worth understanding which implementation you’re using and what its defaults are, especially if deque performance is relevant to your workload.
If you need a portable high-performance deque implementation that offers noexcept guarantees for essential constructors or you want to take advantage of a configurable block size, take a look at Boost.Container’s new deque implementation.

Leave a Reply