GenericMapInterner

Struct GenericMapInterner 

Source
pub struct GenericMapInterner { /* private fields */ }
Expand description

A string interner based on a single, fixed-size backing buffer with support for reclamation.

§Overview

This interner uses a single, fixed-size backing buffer where interned strings are stored contiguously. This provides bounded memory usage, and the interner will not allocate additional memory for new strings once the buffer is full. Since interned strings are not likely to need to live for the life of the program, the interner supports reclamation. Once all references to an interned string have been dropped, the storage for that string is reclaimed and can be used to hold new strings.

§Storage layout

The backing buffer stores strings contiguously, with an entry “header” prepended to each string. The header contains relevant data – hash of the string, reference count, and length of the string – needed to work with the entry either when searching for existing entries or when using the entry itself.

The layout of an entry is as follows:

┌───────────────────────── entry #1 ──────────────────────────┐ ┌─ entry #2 ─┐ ┌─ entry .. ─┐
▼                                                             ▼ ▼            ▼ ▼            ▼
┏━━━━━━━━━━━┯━━━━━━━━━━━┯━━━━━━━━━━━┯━━━━━━━━━━━┯━━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━┓
┃ str hash  │  ref cnt  │  str len  │ str data  │   padding   ┃ ┃   header   ┃ ┃   header   ┃
┃ (8 bytes) │ (8 bytes) │ (8 bytes) │ (N bytes) │ (1-7 bytes) ┃ ┃  & string  ┃ ┃  & string  ┃
┗━━━━━━━━━━━┷━━━━━━━━━━━┷━━━━━━━━━━━┷━━━━━━━━━━━┷━━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━┛
▲                                   ▲                           ▲
└────────── `EntryHeader` ──────────┘                           └── aligned for `EntryHeader`
         (8 byte alignment)                                         via trailing padding

The backing buffer is always aligned properly for EntryHeader, so that the first entry can be referenced correctly. However, when appending additional entries to the buffer, we need to ensure that those entries also have an aligned start for accessing the header. This is complicated due to the variable number of bytes for the string data.

Alignment padding is added to the end of the entry to ensure that when appending the next entry, the start of the entry is properly aligned for EntryHeader. In the worst case, up to 7 bytes could be added (and thus wasted) on this alignment padding.

§InternedString

The InternedString type is a handle to the entry header, and thus the string data, for an interned string. It is designed to be small – 8 bytes! – and cheap to clone, as it contains an atomic reference to the entry header and a reference to the interner that owns the string. It dereferences to the underlying string with relatively low overhead: two pointer indirections.

When an InternedString is dropped, it decrements the reference count for the entry it points to. If the reference count drops to zero, it will attempt to mark the entry for reclamation.

§Reclamation

As we want to bound the memory used by the interner, but also not allow it to be filled up with strings that eventually end up going entirely unused, we need a way to remove those unused strings so their underlying storage can be used for new strings. This is where reclamation comes in.

When a string is interned, the entry header tracks how many active references there are to it. When that reference count drops to zero, the last reference to the string attempts to mark the entry for reclamation. Assuming no other reference has been taken out on the entry in the meantime, the entry gets added to a list of “reclaimed” entries.

Reclaimed entries are simple markers – start and end position in the data buffer – which are stored in a freelist. When attempting to intern a new string, this freelist is searched to see if there’s an entry large enough to fit the new string, and if so, it is used.

Additionally, when entries are reclaimed, adjacent entries are merged together where possible. This helps to avoid unnecessary fragmentation over time, although not as effectively as reconstructing the data buffer to re-pack entries.

Implementations§

Source§

impl GenericMapInterner

Source

pub fn new(capacity: NonZeroUsize) -> Self

Creates a new GenericMapInterner with the given capacity.

The given capacity will potentially be rounded up by a small number of bytes (up to 7) in order to ensure the backing buffer is properly aligned.

Trait Implementations§

Source§

impl Clone for GenericMapInterner

Source§

fn clone(&self) -> GenericMapInterner

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for GenericMapInterner

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Interner for GenericMapInterner

Source§

fn is_empty(&self) -> bool

Returns true if the interner contains no strings.
Source§

fn len(&self) -> usize

Returns the number of strings in the interner.
Source§

fn len_bytes(&self) -> usize

Returns the total number of bytes in the interner.
Source§

fn capacity_bytes(&self) -> usize

Returns the total number of bytes the interner can hold.
Source§

fn try_intern(&self, s: &str) -> Option<InternedString>

Attempts to intern the given string. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.