pub struct FixedSizeInterner<const SHARD_FACTOR: usize> { /* private fields */ }
Expand description
A string interner based on a single, fixed-size backing buffer with support for reclamation.
§Overview
This interner uses a single, fixed-size backing buffer where interned strings are stored contiguously. This provides bounded memory usage, and the interner will not allocate additional memory for new strings once the buffer is full. Since interned strings are not likely to need to live for the life of the program, the interner supports reclamation. Once all references to an interned string have been dropped, the storage for that string is reclaimed and can be used to hold new strings.
§Storage layout
The backing buffer stores strings contiguously, with an entry “header” prepended to each string. The header contains relevant data – hash of the string, reference count, and length of the string – needed to work with the entry either when searching for existing entries or when using the entry itself.
The layout of an entry is as follows:
┌───────────────────────── entry #1 ──────────────────────────┐ ┌─ entry #2 ─┐ ┌─ entry .. ─┐
▼ ▼ ▼ ▼ ▼ ▼
┏━━━━━━━━━━━┯━━━━━━━━━━━┯━━━━━━━━━━━┯━━━━━━━━━━━┯━━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━┓
┃ str hash │ ref cnt │ str len │ str data │ padding ┃ ┃ header ┃ ┃ header ┃
┃ (8 bytes) │ (8 bytes) │ (8 bytes) │ (N bytes) │ (1-7 bytes) ┃ ┃ & string ┃ ┃ & string ┃
┗━━━━━━━━━━━┷━━━━━━━━━━━┷━━━━━━━━━━━┷━━━━━━━━━━━┷━━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━┛
▲ ▲ ▲
└────────── `EntryHeader` ──────────┘ └── aligned for `EntryHeader`
(8 byte alignment) via trailing padding
The backing buffer is always aligned properly for EntryHeader
, so that the first entry can be referenced
correctly. However, when appending additional entries to the buffer, we need to ensure that those entries also have
an aligned start for accessing the header. This is complicated due to the variable number of bytes for the string
data.
Alignment padding is added to the end of the entry to ensure that when appending the next entry, the start of the
entry is properly aligned for EntryHeader
. In the worst case, up to 7 bytes could be added (and thus wasted) on
this alignment padding.
§InternedString
The InternedString
type is a handle to the entry header, and thus the string data, for an interned string. It is
designed to be small – 8 bytes! – and cheap to clone, as it contains an atomic reference to the entry header and a
reference to the interner that owns the string. It dereferences to the underlying string with relatively low
overhead: two pointer indirections.
When an InternedString
is dropped, it decrements the reference count for the entry it points to. If the reference
count drops to zero, it will attempt to mark the entry for reclamation.
§Reclamation
As we want to bound the memory used by the interner, but also not allow it to be filled up with strings that eventually end up going entirely unused, we need a way to remove those unused strings so their underlying storage can be used for new strings. This is where reclamation comes in.
When a string is interned, the entry header tracks how many active references there are to it. When that reference count drops to zero, the last reference to the string attempts to mark the entry for reclamation. Assuming no other reference has been taken out on the entry in the meantime, the entry gets added to a list of “reclaimed” entries.
Reclaimed entries are simple markers – start and end position in the data buffer – which are stored in a freelist. When attempting to intern a new string, this freelist is searched to see if there’s an entry large enough to fit the new string, and if so, it is used.
Additionally, when entries are reclaimed, adjacent entries are merged together where possible. This helps to avoid unnecessary fragmentation over time, although not as effectively as reconstructing the data buffer to re-pack entries.
Implementations§
Source§impl<const SHARD_FACTOR: usize> FixedSizeInterner<SHARD_FACTOR>
impl<const SHARD_FACTOR: usize> FixedSizeInterner<SHARD_FACTOR>
Sourcepub fn new(capacity: NonZeroUsize) -> Self
pub fn new(capacity: NonZeroUsize) -> Self
Creates a new FixedSizeInterner
with the given capacity.
The given capacity will potentially be rounded up by a small number of bytes (up to 7) in order to ensure the backing buffer is properly aligned.
Trait Implementations§
Source§impl<const SHARD_FACTOR: usize> Clone for FixedSizeInterner<SHARD_FACTOR>
impl<const SHARD_FACTOR: usize> Clone for FixedSizeInterner<SHARD_FACTOR>
Source§fn clone(&self) -> FixedSizeInterner<SHARD_FACTOR>
fn clone(&self) -> FixedSizeInterner<SHARD_FACTOR>
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read more