Kongming HV
Kongming is a hyperdimensional computing library implementing sparse binary hypervectors for cognitive computing applications.
The core engine is implemented in Rust for maximum efficiency, while ergonomic APIs are open-sourced in Python for better usability.
See Hypervectors for an introduction to hyperdimensional computing and the sparse binary representation.
License
The Python source code, examples, and documentation in this repository are licensed under the MIT License.
The compiled engine distributed via PyPI (kongming-rs-hv) is proprietary.
Install
pip install kongming-rs-hv
See Installation for supported platforms and verification steps.
Published notebooks
See Notebook Platforms for all available notebooks and platform details.
Guides
| Guide | Description |
|---|---|
| Python Quick Start | Installation, examples, and walkthrough |
| Notebook Quick Start | Platform setup, interactive notebooks, cell-by-cell walkthrough |
Language Support
This documentation covers code snippets in multiple languages (if available) side by side.
- Python: bindings to the underlying Rust implementation (public
kongming-rs-hvon PyPI); - Go: canonical / reference implementation in proprietary package;
- Rust: parallel implementation, carefully maintained in feature parity;
Docs versioning
The documentation on yangzh.github.io/hv is
deployed from release tags (v*) and stays in lockstep with the latest
kongming-rs-hv release on PyPI. Whatever you read there matches what
pip install kongming-rs-hv gives you.
The main branch of this
repository is the working head — it may describe APIs or examples that haven’t
been released yet. If you browse the raw markdown on GitHub, expect it to
occasionally be ahead of the published site.
Reference
The work was initially outlined in this arxiv paper, built on top of the work from many others, and here is the citation:
Yang, Zhonghao (2023). Cognitive modeling and learning with sparse binary hypervectors. arXiv:2310.18316v1 [cs.AI]
Feedback
Found a bug, have a question, or want to suggest an improvement? Open an issue on GitHub.
Hypervectors
What is Hyperdimensional Computing?
Hyperdimensional computing (HDC) represents concepts as high-dimensional vectors and manipulates them with simple algebraic operations, typically the dimension (of any vectors) can be as high as thousands.
The key insight is that random vectors in high-dimensional spaces are nearly orthogonal — giving each concept a unique, distributed, and robust representation that tolerates potential ambiguity and interference.
In that sense, the traditional notion of curse of dimensionality becomes the bless of dimensionality.
Motivated readers should perform their own background research on this topic.
Sparse Binary Representation
Kongming uses sparse binary hypervectors. Each vector has a fixed, large number of dimensions (e.g., 65,536/64K or 1,048,576/1M), but only a very small fraction of them are “on” (set to 1). This sparsity is controlled by the Model configuration.
Furthermore, we focus on a special sparse binary configuration: SparseSegmented where each vector is divided into equal-sized segments, and exactly one bit is ON per segment.
Conceptually you can imagine each SparseSegmented hypervector as a list of phasers, where the offset of ON bit (within a host segment) represents the discretized phase.
In general, this unique constraint enables:
- Compact storage: only the offset of ON bit within its host segment need to be stored
- Efficient operations: Unlike neural nets, where weights are recorded in float numbers, binary operations can be stored and manipulated very efficiently with modern memory / CPUs.
Identity and Inverses
- The identity vector has all offsets set to 0. Binding with identity is a no-op. Actually as a special case, there is no storage cost;
- Binding a vector with its inverse yields the identity.
Similarity and distance measure
Two vectors are compared via overlap — the count of segments where both have the same ON bit. This is equivalent to a bitwise AND operation, which can be performed very efficiently in modern CPU.
For a model with cardinality and segment size , the expected overlap between two random vectors and is:
Given the model setup, this is typically 0, 1 or 2.
Semantically-related vectors have significantly higher overlap. A vector’s overlap with itself equals its cardinality .
The commonly-used distance measure (dis-similar measure) for binary vectors is Hamming Distance, equivalent to a bitwise XOR operation. As we discussed (and proved) in the paper, the overlap and Hamming distance between sparse binary hypervectors are two sides of the same coin, with the following equation:
Supported Models
A Model determines the total number of dimensions (width), how those dimensions are divided into segments (cardinality and sparsity), and therefore implies critical storage and compute characteristics.
| Model | Width | Sparsity Bits | Segment Size | Cardinality (ON bits) |
|---|---|---|---|---|
MODEL_64K_8BIT | 65,536 | 8 | 256 | 256 |
MODEL_1M_10BIT | 1,048,576 | 10 | 1,024 | 1,024 |
MODEL_16M_12BIT | 16,777,216 | 12 | 4,096 | 4,096 |
MODEL_256M_14BIT | 268,435,456 | 14 | 16,384 | 16,384 |
MODEL_4G_16BIT | 4,294,967,296 | 16 | 65,536 | 65,536 |
Model properties
All model functions take a Model enum value and return the derived property:
For simplicity, we use function names from Python. The counterparts from Go / Rust can be found by consulting their respective references.
| Function | Description |
|---|---|
width | Total dimension count (2^width_bits) |
sparsity | Fraction of ON bits (1 / segment_size) |
cardinality | Number of ON bits (= number of segments) |
segment_size | Dimensions per segment |
How to Choose a Model
MODEL_64K_8BIT: Fast prototyping, tiny memory footprint. Good for tests and small-scale experiments.MODEL_1M_10BIT: General-purpose, balances performance and storage.MODEL_16M_12BIT: General-purpose, for the adventurous.MODEL_256M_14BIT/MODEL_4G_16BIT: Very high capacity, not there yet.
Larger models provide more orthogonal space (lower collision probability) at the cost of more memory per vector.
The storage per hypervector estimation only applies to SparseSegmented (and a few other types) where raw offsets are needed. For certain scenarions, optimization can be employed to dramatically reduce storage requirements. Sparkle, for example, only stores the random seeds so that the offsets can be recovered on-the-fly at serialization time. Composite types (such as Set, Sequence) typically contain references to member Sparkle instances, and typically cost much less storage than a single SparseSegmented instance.
Operators
Kongming provides two core algebraic operations on hypervectors.
Bind
Binding () combines two vectors into a result that is dissimilar to both inputs. It is the multiplicative operation in the HDC algebra.
Mathematically
Implementation: segment-wise offset addition modulo segment size: check out original paper for details.
Check out code snippets from the API reference.
Release
Occasionally we use release, which is derived from bind, as the equivalent of division, as opposed to multiplication.
Note that release is anti-commutative:
Check out code snippets from the API reference.
Bundle
Bundling () creates a superposition of vectors — the result is similar to all inputs. It is the additive operation within VSA algebra.
Mathematically
Check out original paper for details on bundle operator.
Check out code snippets from the API reference.
Composites
Composites combine multiple hypervectors into higher-level structures. Each composite type uses a different combination strategy, preserving different kinds of relationships between its members.
All composites follows the same contract (interface in Go and traits in Rust) and can be nested — a Set can contain Sparkles, Knots, or even other Sets.
Set
An unordered collection of concepts.
where is a special marker to distinguish a set from its individual members.
This mark is tuned for the domain, so that it will be shared among all sets within the same domain.
Use when: you need to represent “these things together” without order.
Check out code snippets from the API reference.
Sequence
An ordered collection.
where is a generic hypervector for positional encoding.
is a special marker to distinguish a sequence from its individual members. This mark is tuned for the domain, so that it will be shared among all sequences within the same domain.
Use when: order matters (e.g., words in a sentence, events in time).
Check out code snippets from the API reference.
Octopus
A key-value structure. Each key (a string) is converted to a Sparkle and bound with its corresponding value before bundling.
Use when: you need to represent structured records with named attributes.
Check out code snippets from the API reference.
Knot
The result of binding (multiplicative composition) of hypervectors.
Binding is reversible: given a Knot of A and B, you can recover A by releasing B (binding with B’s inverse).
Use when: you need a reversible association between concepts.
Check out code snippets from the API reference.
Parcel
The result of bundling (additive composition).
Unlike direct bundling, Parcel keeps tracking of its members for serialization and introspection.
Use when: you need a superposition of concepts, with optional weights.
Check out code snippets from the API reference.
Pointer
A one-directional reference between two hypervectors.
A Pointer encodes a directed link from a source A to a destination B. Given the pointer and either endpoint, the other endpoint can be recovered: P ⊗ B recovers A, and A ⊗ P^{-1} recovers B.
Pointer is the structured wrapper for the release operation — Release(A, B) returns a Pointer with A as source and B as destination.
Use when: you need a reversible directional link (e.g., representing edges, mappings, or “from→to” relations).
Check out code snippets from the API reference.
Summary
| Type | Composition | Order? | Use Case |
|---|---|---|---|
| Set | Bundle + marker | No | Unordered groups |
| Sequence | Positional-bind + bundle + marker | Yes | Ordered lists |
| Octopus | Key-bind + bundle | Partial (by key) | Key-value records |
| Knot | Bind (multiply) | No | Reversible associations |
| Parcel | Bundle (add) | No | superpositions, weighted or unweighted |
| Pointer | Bind A with Inv(B) | Directional | One-directional references |
Near Neighbor Search
Near Neighbor Search (NNS) generally retrieves chunks from the storage substrate in the increasing order of Hamming distance (from a query).
As we mentioned earlier, this is equivalent to a strictly decreasing order of overlap (between query and candidate). If overlap encodes the semantic relevance, this translates to a list of semantically similar candidates.
It leverages an underlying Associative Index for efficient recovery of candidates. The Associative Index is a semantic index that enables fast similarity-based lookup over stored hypervectors. Conceptually it turns a key-value substrate (item memory) into an associative memory — one where retrieval is by content similarity, not by exact content or key match.
This NNS module has a constant time complexity, with help from associative index. This implies the query time remain bounded, independent of the number of entries in the storage system. The secret sauce is the efficient random-access to underlying associative index.
Unlike approximate nearest neighbor methods (LSH, HNSW, etc.), the NNS module can computes exact overlap counts via the associative index. There is no approximation error and no index-specific parameters to tune.
Jump to the API reference for Near-Neighbor Search.
HV
The core hypervector API. This module provides the building blocks for hyperdimensional computing: vector types, algebraic operators, and model configuration.
| Section | Description |
|---|---|
| Common Utilities | Model, SparseOperation, similarity, identity, hashing |
| Operators | Bind, bundle, and BindDirect |
| HyperBinary Types | Interface + concrete types (Sparkle, Set, Sequence, etc.) |
| Customizing Run-time Behavior | Environment variables |
| Misc | Display, serialization |
Common Utilities
Functions and types used across all HyperBinary types.
| Section | Description |
|---|---|
| Models | Model enum and model functions |
| SparseOperation | Model + seeded RNG for deterministic vector generation |
| Seed128 | 128-bit seed embedding Domain + Pod |
| Domain & Pod | Semantic grouping (Domain) and slot identifier (Pod) |
| Utilities | Similarity, identity check, hashing |
Models
See Concepts: Hypervectors for the full overview.
Model Enum
model0 = hv.MODEL_64K_8BIT
model1 = hv.MODEL_1M_10BIT
Model Functions
hv.width(hv.MODEL_1M_10BIT) # total dimensions
hv.cardinality(hv.MODEL_1M_10BIT) # ON bit count
hv.sparsity(hv.MODEL_1M_10BIT) # sparsity
hv.segment_size(hv.MODEL_1M_10BIT) # dimensions per segment
See also: SparseOperation — Model + seeded RNG for deterministic vector generation.
Domain & Pod
A Domain models the semantic grouping for hypervectors, providing the high 64-bit half of a Seed128. A Pod is a slot within a Domain, providing the low 64-bit half. The (Domain, Pod) pair uniquely identifies a Sparkle.
Domain Constructors
# From a name string (hashed to a 64-bit id)
d = hv.Domain("animals")
# Same as above
d = hv.Domain.from_name("animals")
# From a raw 64-bit id
d = hv.Domain.from_id(0x1234567890abcdef)
# From a domain prefix enum and a name suffix
# The id is computed as xxhash(prefix_label + "." + name)
d = hv.Domain.from_prefix_and_name(hv.DOMAIN_PREFIX_NLP, "concept")
# Accessors
d.id() # u64
d.name() # str (empty if constructed from id)
d.domain_prefix() # int (0 = UNKNOWN if no prefix was set)
d.is_default() # True if id == 0
Domain Prefix Constants
| Constant | Label |
|---|---|
hv.DOMAIN_PREFIX_USER | 🎭 |
hv.DOMAIN_PREFIX_NLP | 💬 |
Domain prefixes provide namespacing for domains. When a prefix is set, the domain id is derived from the prefix label (and optional name), ensuring consistent hashing across languages.
Pod Constructors
Pods can be seeded by a string word, a raw uint64, or a prewired enum value.
# From a word string (hashed to a 64-bit seed)
p = hv.Pod("cat")
# Same as above
p = hv.Pod.from_word("cat")
# From a raw 64-bit seed
p = hv.Pod.from_seed(42)
# From a prewired enum value
p = hv.Pod.from_prewired(hv.PREWIRED_SET_MARKER)
p = hv.Pod.from_prewired(hv.PREWIRED_STEP)
# Accessors
p.seed() # u64
p.word() # str (empty if constructed from seed or prewired)
p.prewired() # int (0 if not prewired)
p.is_default() # True if seed == 0
Prewired Constants
Prewired pods are infrastructure-level constants with fixed seeds:
| Constant | Label |
|---|---|
hv.PREWIRED_NIL | ∅ |
hv.PREWIRED_FALSE | ❎ |
hv.PREWIRED_TRUE | ✅ |
hv.PREWIRED_BEGIN | 🚀 |
hv.PREWIRED_END | 🏁 |
hv.PREWIRED_LEFT | ⬅️ |
hv.PREWIRED_RIGHT | ➡️ |
hv.PREWIRED_UP | ⬆️ |
hv.PREWIRED_DOWN | ⬇️ |
hv.PREWIRED_MIDDLE | ⏺️ |
hv.PREWIRED_STEP | 𓊍 |
hv.PREWIRED_SET_MARKER | 🫧 |
hv.PREWIRED_SEQUENCE_MARKER | 📿 |
Polymorphic arguments (Python-only)
Most Python factories that take a Domain or Pod accept the
underlying primitives directly — you rarely need to wrap them
explicitly:
| Parameter type | Accepted Python forms |
|---|---|
Domain | Domain instance, str, int, (DomainPrefix, str) tuple |
Pod | Pod instance, Prewired enum, str, int |
# Domain — four equivalent forms in any factory expecting a Domain:
memory.by_item_key("animals", "cat")
memory.by_item_key(hv.Domain.from_name("animals"), "cat")
memory.by_item_key(0x1234, "cat") # from numeric id
memory.by_item_key((hv.DOMAIN_PREFIX_NLP, "concept"), "p") # from (prefix, name)
# Pod — Prewired enum is recognized:
memory.new_terminal("internal", hv.PREWIRED_STEP) # Pod from Prewired
memory.new_terminal("animals", "cat") # Pod from word
memory.new_terminal("animals", 0xCAFE_BABE) # Pod from raw seed
For the parallel polymorphism on Seed128, see
Seed128 → Polymorphic arguments.
Seed128
A Seed128 is a 128-bit seed to drive a random number generator.
The current random number generator expects 2 64-bit seeds: the same (seed_high, seed_low) pair always produces the same sequence of random numbers, enabling reproducible and deterministic vector generation across runs and languages.
Constructors
# From Domain and Pod arguments (each accepts Domain/Pod, int, or str)
seed = hv.Seed128("animals", "cat") # domain name + pod word
seed = hv.Seed128(0, 42) # default domain + raw pod seed
seed = hv.Seed128("animals", 42) # domain name + raw pod seed
seed = hv.Seed128(hv.Domain("animals"), hv.Pod("cat")) # explicit Domain/Pod objects
# Zero seed
seed_zero = hv.Seed128.zero() # (0, 0)
# Random seed from a SparseOperation
seed_rand = hv.Seed128.random(so) # consumes two u64 from the RNG
# Accessors
seed.domain() # Domain object
seed.pod() # Pod object
seed.high() # u64 (domain id)
seed.low() # u64 (pod seed)
Usage
All composite constructors take a Seed128, as seed for the bundle operator:
seed = hv.Seed128("fruits", "fruit_set")
s = hv.Set(seed, a, b, c)
seq = hv.Sequence(seed, a, b, c)
Polymorphic arguments (Python-only)
Anywhere a Python factory expects a Seed128 (composite constructors
like hv.Set / hv.Sequence / hv.Octopus, the hv.bundle operator,
etc.) you can pass either a Seed128 instance or a (domain, pod)
tuple — the binding extracts and constructs the seed for you.
| Parameter type | Accepted Python forms |
|---|---|
Seed128 | Seed128 instance, or a (domain, pod) tuple |
The tuple composes with the polymorphic forms accepted by Domain
and Pod (see
Domain & Pod → Polymorphic arguments),
so each side can itself be a string / int / Prewired enum / (prefix, name)
tuple — letting you skip the hv.Seed128(...) wrap entirely:
# Equivalent to hv.Sequence(hv.Seed128("words", "hi"), m1, m2):
seq = hv.Sequence(("words", "hi"), m1, m2)
# Tuple form composes with Domain's (DomainPrefix, str) tuple:
seq = hv.Sequence(((hv.DOMAIN_PREFIX_NLP, "concept"), "myseq"), m1, m2)
# And with Pod's Prewired enum:
seq = hv.Sequence(("internal", hv.PREWIRED_STEP), m1, m2)
SparseOperation
A SparseOperation instance wraps a Model, a random number generator, and potentially other information related to the sparse operation in general.
Constructor
so = hv.SparseOperation(hv.MODEL_1M_10BIT, 0, 42)
so1 = hv.SparseOperation(hv.MODEL_1M_10BIT, "domain", "pod")
Methods
so.model() # Model enum
so.width() # width for this model
so.cardinality() # cardinality for this model
so.sparsity() # sparsity for this model
so.uint64() # next random number
Usage: Generating Random Vectors
so = hv.SparseOperation(hv.MODEL_1M_10BIT, 0, 42)
sparkle = hv.Sparkle.random(hv.Domain("domain"), so)
Utilities
Similarity
hv.overlap(a, b) # Overlap
hv.hamming(a, b) # Hamming distance
hv.equal(a, b) # Equality check
Identity Check
v=hv.Sparkle.identity(model)
hv.is_identity(v) # True if v is an identity vector
Hash Utilities
hv.hash64_from_string("hello") # deterministic u64 hash from string
hv.hash64_from_bytes(b"\x01\x02") # deterministic u64 hash from bytes
hv.curr_time_as_seed() # current time as a u64 seed
hv.kongming_studio_seed() # fixed studio seed constant
HyperBinary Types
All vector types conform to a common interface. In Go this is the HyperBinary interface; in Rust it is the HyperBinary trait. The two implementations are kept at feature parity.
Python doesn’t have the concept of interface/trait, but all HyperBinary derived types share a common set of methods.
v.model() # Model enum
v.width()
v.cardinality()
v.hint()
v.stable_hash() # int
v.seed128()
v.exponent()
v.core() # SparseSegmented
v.power(p) # HyperBinary
Concrete Types
| Type | Description |
|---|---|
| SparseSegmented 🍡 | Foundational vector — packed per-segment offsets |
| Sparkle ✨ | Seeded, deterministic hypervector |
| Learner 💫 | Online Hebbian learning |
| Set 🫧 | Unordered collection |
| Sequence 📿 | Ordered collection with positional encoding |
| Octopus 🐙 | Key-value composite |
| Knot 🪢 | Bound (multiplied) group |
| Parcel 🎁 | Bundled (added) group |
SparseSegmented 🍡
The most foundational vector type — a sparse binary hypervector where each segment has exactly one ON bit at the recorded offset location. All other types (Sparkle, Set, Sequence, etc.) ultimately contain a SparseSegmented in memory for processing.
Structure
| Field | Description |
|---|---|
model | Sparsity configuration (Model) |
offsets | Packed bit array of per-segment ON offsets. nil/None = identity vector |
hash | Lazy-computed stable hash for equality checks |
The offsets are bit-packed according to the model’s sparsity bits — they do not align to byte boundaries. This trades a small CPU cost for compact, uniform storage that works both in memory and on disk.
Identity vector: when offsets is blank, the vector is the identity vector where all offsets are 0. Binding with identity is a no-op, and identity requires zero storage for offsets.
Constructors
# Identity
ss = hv.SparseSegmented.identity(model)
# From per-segment offsets, typically discouraged...
ss = hv.SparseSegmented.from_offsets(model, [off0, off1, ...])
Key Methods
ss.is_identity() # True if identity vector
ss2 = ss.power(2)
inv = ss.power(-1)
# Similarity
hv.overlap(a, b) # Count of matching ON bits
hv.hamming(a, b) # Count of differing segments
ss.offsets() # returns all offsets
Serialization
SparseSegmented serializes to HyperBinaryProto with hint SPARSE_SEGMENTED. The offsets field carries the raw packed bytes. Identity vectors serialize with empty offsets.
Sparkle ✨
Sparkles are the atomic building block for higher-level constructs: essentially SparseSegmented annotated with domain and pod.
Domain is a logical namespace that groups related Sparkle instances. Pod acts as the secondary identifier for a Sparkle instance.
Sparkle is deterministic: the same (domain, pod) pair always produces the same offsets. For this reason, the (model, pod) pair uniquely identifies a Sparkle.
Sparkle Constructors
# From a word string
s0 = hv.Sparkle.from_word(model, "animals", "cat")
# From a numeric seed
s1 = hv.Sparkle.from_seed(model, "animals", 42)
# From a prewired enum
s2 = hv.Sparkle.from_prewired(model, "animals", hv.PREWIRED_SET_MARKER)
# Identity vector
s3 = hv.Sparkle.identity(model)
# Random (from SparseOperation)
so=hv.SparseOperation(hv.MODEL_1M_10BIT, "domain", "pod")
s4 = hv.Sparkle.random("animals", so)
Key Methods
s0.model() # Model enum
s0.stable_hash() # Deterministic hash
s0.exponent() # Current exponent (1 for base vector)
s0_square=s0.power(2) # Returns p-th power (new Sparkle)
hv.equal(s0, s0_square) # s0_square = s0^2, different from original s0.
core0=s0.core() # Returns underlying SparseSegmented
core0.offsets() # The raw offsets for each segment.
power(0) always returns the identity sparkle. power(-1) returns the inverse.
Pretty-printing
# Pretty-printing, or s.__str__()
print(s0)
# ✨:🔗animals,🌱cat
# More detailed information, or s.__repr__()
s
# hint: SPARKLE
# model: MODEL_1M_10BIT
# stable_hash: 9725717137035622833
# domain:
# name: animals
# pod:
# word: cat
During pretty-printing of Sparkle instances, you may notice special emoji for domain / pods.
| Emoji | Variant | Example |
|---|---|---|
| 🔗 | named domain | 🔗animals, 🔗PREFIX.name |
| 🌐 | numeric domain | 🌐0x..c862 |
| 🌱 | named pod | 🌱cat |
| 🫛 | numeric pod | 🫛0x..80e4 |
| 🍀 | pre-defined pod | 🍀SET_MARKER |
| 💪 | Exponent / Power | 💪3, 💪-1 |
Identity vectors display as IDENT (e.g., ✨IDENT).
The underlying offsets are lazily generated from a seeded PRNG. Only the seeds are stored in serialization, which is a significant storage saving; offsets are recomputed during de-serialization.
Learner 💫
Learners are designed to perform online bundling for a stream of observations, in the form of Hebbian-style learning.
The total storage / processing budget is fixed — what matters is the distribution of weights among observed vectors.
Constructors
learner = hv.Learner(model, hv.Seed128(0, 42))
# a randomly-initialized learner.
learner = hv.Learner.random(so)
Feeding Observations
learner.bundle(a) # single observation
learner.bundle_multiple(b, 3) # with weight multiplier
Inspection
learner.age() # number of observations seen
learner.affinity(a) # raw overlap; returns RandomOverlap when age==0
learner.weight(a) # implicit weight for a probe vector; 0.0 when age==0
A fresh Learner (age == 0) has no offsets to overlap against. Affinity
short-circuits to a non-zero baseline so that Weight
yields exactly 0.0 for any probe — neutral, non-selecting, also non-rejecting.
Set 🫧
An unordered collection of hypervectors. See Composites: Set for the conceptual overview.
Constructor
s = hv.Set(hv.Seed128(0, 42), first, second, third)
Notable methods
# All these will be approximately 1/3 of the total cardinality.
hv.overlap(s.unmasked(), first)
hv.overlap(s.unmasked(), second)
hv.overlap(s.unmasked(), third)
Sequence 📿
An ordered collection of hypervectors with positional encoding. See Composites: Sequence for the conceptual overview.
Constructor
# Constructing a sequence, with logical index start at 1 (default to 0).
seq = hv.Sequence(hv.Seed128(0, 42), first, second, third, start=1)
In-place edits: Append / Prepend / Reset
Append, Prepend, and Reset all mutate the Sequence in place —
clone first if you need to preserve the original.
Append(more...)— add members at the end.startis unchanged.Prepend(more...)— add members at the front;startdecrements bylen(more)so existing members keep their positional binding.Reset(start)— shift the starting index. No-op whenstartequals the current start.
After any of these, seq equals what you’d get by building a fresh
NewSequence(seed, new_start, all_members...) — the domain/pod seed is
preserved.
import copy
seq = hv.Sequence(hv.Seed128(0, 42), a, b, c)
# Append / Prepend are variadic and mutate in place.
seq.append(d, e) # seq now [a, b, c, d, e]
seq.prepend(x, y) # seq now [x, y, a, b, c, d, e], start -= 2
seq.reset(10) # shift the starting index to 10
# To preserve the original, clone first:
base = hv.Sequence(hv.Seed128(0, 42), a, b, c)
s1 = copy.copy(base)
s1.append(d) # base is untouched
Octopus 🐙
A key-value composite where each value is bound with its key’s Sparkle. See Composites: Octopus for the conceptual overview.
Constructor
Keys are Pods. In Python, strings (and any value polymorphically convertible to Pod) are accepted and auto-converted.
oct = hv.Octopus(hv.Seed128(0, 42), ["color", "shape"], red, circle)
Key Methods
oct.value_by_key("color") # accepts Pod | str | int | Prewired
Knot 🪢
The result of binding (multiplicative composition) of hypervectors. Unlike BindDirect, Knot tracks its member parts for serialization and debugging. See Composites: Knot.
Constructor
# Not directly constructed in Python. Use hv.bind() instead.
k = hv.bind(a, b)
Extending a Knot
An existing Knot can be extended with additional parts via
expand. This mutates the Knot
in place — equivalent to re-binding all parts from scratch but
without reconstructing the base.
k = hv.bind(a, b)
k.expand(c) # k is now equivalent to hv.bind(a, b, c)
If you need to preserve the original, clone before expanding — see the Expand operator section for full examples including clone-first patterns.
Parcel 🎁
The result of bundling (additive composition) of hypervectors. Unlike BundleDirect, Parcel tracks its members and bundling seed for serialization and debugging. See Composites: Parcel.
Constructors
p = hv.bundle(hv.Seed128(10, 1), a, b, c)
Pointer 👉
A one-directional reference between two hypervectors. A Pointer encodes a directed link from a source to a destination via P = source ⊗ Inv(destination). Given the pointer and either endpoint, the other endpoint can be recovered. See Composites: Pointer.
Constructor
p = hv.Pointer(hv.Seed128(0, 42), source, destination)
# Or via the release operator:
p = hv.release(source, destination)
Endpoints
A Pointer retains references to its source (A) and destination (B).
p.source() # → A
p.destination() # → B
Recovering endpoints
Given the pointer and one endpoint, the other can be recovered:
RDeref(B) = A— recover the source given the destination, viaP ⊗ B.Deref(A) = B— recover the destination given the source, viaA ⊗ Inv(P).
p = hv.Pointer(seed, a, b)
recovered_a = p.rderef(b) # ≈ a
recovered_b = p.deref(a) # ≈ b
Anti-commutativity
Pointer (and the release operator that constructs it) is anti-commutative:
Operators
See Concepts: Operators for the full overview.
Bind
bound = hv.bind(a, b)
released = hv.release(bound, b) # this will recover `a`
hv.equal(a, b) # hash equality
Release
Extracts one component from a binding:
release returns a Pointer — a directional reference from composite to role that retains both endpoints for inspection and serialization. The bit-level value is identical to bind(composite, inverse(role)).
bound = hv.bind(role, filler)
recovered = hv.release(bound, role) # Pointer; ≈ filler at the bit level
Expand (extend a Knot)
Extends an existing Knot with additional operands without
re-binding from scratch. k.expand(c) on k = bind(a, b) gives the
same result as bind(a, b, c) — but mutates k in place, so clone
first if you need the original.
import copy
k = hv.bind(a, b)
k.expand(c) # k is now equivalent to hv.bind(a, b, c)
# To preserve the original, clone first:
base = hv.bind(a, b)
k1 = copy.copy(base)
k1.expand(c) # base is untouched
BindDirect
Like Bind, but returns a raw SparseSegmented instead of a
Knot — no operand tracking. Cheaper for intermediate computations
where you don’t need to reverse the bind or inspect the operand list.
# domain/pod default to the zero Domain/Pod
ss = hv.bind_direct(a, b, c)
# Or supply an explicit seed (annotates the resulting SparseSegmented):
ss = hv.bind_direct(a, b, domain=d, pod=p)
Bundle
p = hv.bundle(hv.Seed128(10, 1), a, b, c)
Customizing runtime behavior
Environment Variables
All environment variables are read once on first access and cannot be changed at runtime. Unset variables use the documented default.
KONGMING_RNG
Selects the pseudo-random number generator backend used for hypervector generation.
| Value | Description |
|---|---|
xoshiro++ (default) | xoshiro256++ — simple, fast, cross-language deterministic |
pcg | PCG-DXSM — classic/compat mode (matches pre-v3.7.5 behavior) |
Changing this affects all generated vectors: Sparkle offsets, Learner bundling, Cyclone patterns. Vectors generated with different backends are not comparable.
KONGMING_REPR_FORMAT
Controls __repr__() / Repr() output format.
| Value | Description |
|---|---|
YAML (default) | Multi-line YAML dump |
PROTO | Multi-line protobuf debug string |
KONGMING_LEARNER_SAMPLING
Controls the bundling strategy used by Learner.
| Value | Description |
|---|---|
FISHER_YATES (default) | Fisher-Yates shuffle — selects exactly the right number of segments per round |
CLASSIC | Per-segment probabilistic sampling — each segment is independently sampled with a fixed probability |
# Example: use PCG for backward compatibility with pre-v3.7.5 vectors
export KONGMING_RNG=pcg
# Example: switch repr to protobuf debug format
export KONGMING_REPR_FORMAT=PROTO
# Example: use classic sampling in Learner
export KONGMING_LEARNER_SAMPLING=CLASSIC
Querying the Current Environment
Use global_env() to inspect all active settings at runtime. Returns a GlobalEnv protobuf message — new fields added to the proto automatically appear.
>>> hv.global_env()
rng_hint: XOSHIRO256PP
learner_sampling: FISHER_YATES
repr_format: YAML
Misc
Display
All HyperBinary types have a compact, emoji-prefixed string representation for quick visual inspection. See HyperBinary Types for type symbols and Sparkle for field labels.
Python __str__ and __repr__
__str__ (triggered by print()) returns the compact emoji form:
>>> a = hv.Sparkle.with_word(hv.MODEL_64K_8BIT, hv.d0(), "hello")
>>> print(a)
✨:🌐0x..c862,🫛0x..80e4
__repr__ (triggered by evaluating a variable in the shell or notebook) returns a detailed, developer-friendly YAML representation, controlled by the KONGMING_REPR_FORMAT environment variable:
>>> a
hint: SPARKLE
model: MODEL_64K_8BIT
stable_hash: 12345678
domain:
id: ...
pod:
seed: 12345
Set KONGMING_REPR_FORMAT=PROTO for protobuf debug output instead of the default YAML. See Environment Variables for all supported variables.
Go / Rust Display
print(sparkle) # compact emoji form via __str__
repr(sparkle) # detailed YAML/proto form via __repr__
Serialization
# HyperBinary → protobuf bytes
msg = hv.to_message(sparkle)
# protobuf bytes → HyperBinary
obj = hv.from_message(msg)
# raw proto bytes → HyperBinary
obj = hv.from_proto_bytes(data)
# proto bytes → YAML string (for debugging)
hv.format_to_yaml(data)
Memory
The memory package provides persistent and in-memory storage for hypervectors with semantic indexing and near-neighbor search.
The core abstraction is a Chunk — an immutable identity (Sparkle) paired with a mutable semantic code (any HyperBinary). Chunks are stored in a Substrate (pluggable storage backend), queried via ChunkSelectors, and created via ChunkProducers.
| Section | Description |
|---|---|
| Chunk | The fundamental storage unit |
| Substrate & Views | Storage backends and transactional views |
| Selectors | Query builders for reading chunks |
| Producers | Write builders for creating chunks |
Chunk
The fundamental storage unit in the memory system. A Chunk carries a semantic code (any HyperBinary type) along with its derived identity (a Sparkle as implied from the code’s domain/pod).
The identity determines the storage key and drives compositionality — a chunk is either present or absent. The code is potentially learnable, offering opportunities to adapt over time, just like weights from traditional neural nets.
Structure
| Field | Type | Description |
|---|---|---|
code | HyperBinary | Semantic content (can be updated). Required — its domain/pod determines the chunk’s identity. |
id | Sparkle | identity vector, as derived from code’s domain/pod; determines the storage key. |
note | string | Human-readable annotation, primarily for debugging |
extra | protobuf Any | Extensible payload for application-specific data, primarily for debugging |
Inspection
Chunks are typically created via producers (see Producers), but can be inspected after retrieval (see Selectors).
# chunk = memory.first_picked_chunk(view, memory.by_item_key("animals", "cat"))
chunk.id # Sparkle
chunk.code # HyperBinary
chunk.note # str
chunk.extra # Optional[bytes]
Substrate & Views
A Substrate is a pluggable storage backend. It provides transactional views for reading and writing chunks.
View Pattern
All storage access goes through views:
- SubstrateView — read-only, supports key lookup and prefix scanning
- SubstrateMutableView — extends SubstrateView with write staging and atomic commit (to underlying storage)
# Read-only view (context manager)
with storage.new_view() as view:
# Check if chunk exists, without actually reading it back.
exists = view.chunk_exists("animals", "cat")
cat_chunk = view.read_chunk("animals", "cat")
# Mutable view (auto-commits on clean exit, rollback on exception).
# Stage writes by running producers against the view via
# producer.produce(view) — the recommended path for batched writes.
with storage.new_mutable_view() as view:
memory.new_terminal("words", "hi").produce(view)
memory.from_sequence_members("words", "greet", members,
semantic_indexing=True).produce(view)
# commits automatically
Storage Backends
InMemory
Volatile, in-process storage. All data lost on exit. Best for testing and ephemeral caches.
storage = memory.InMemory(hv.MODEL_64K_8BIT, "my_store")
Embedded
Persistent, single-machine storage backed by an embedded key-value store. Suitable for local development and moderate-scale deployments.
storage = memory.Embedded(hv.MODEL_64K_8BIT, "/path/to/store")
ScyllaDB (Distributed)
Distributed storage via Cassandra-compatible ScyllaDB. For high-scale, multi-node deployments.
# Not exposed yet...
Selectors
ChunkSelectors are composable query builders for reading chunks from the substrate. Each selector defines how to locate and return matching chunks.
NNS (Near-Neighbor Search)
Wraps a single attractor to perform near-neighbor search. For multiple attractors, compose them with joiner(...) first.
result = memory.first_picked(
view, memory.nns(
memory.set_members(memory.by_item_key("sets", "my_set"))))
Each attractor conceptually provides “the center of attraction” for candidates: the NNS accepts one or more attractors, to perform the actual near-neighbor search work, by interacting with underlying associative index.
Forward attractors
Roughly forward attractors try to find parts from a given a composite.
| Attractor | Modifier | Attracts |
|---|---|---|
| SetMembersAttractor | depends on selected.code.domain | All members of the Set |
| SequenceMemberAttractor | depends on selected.code.domain | Sequence member at a specific position |
| TentacleAttractor(octopus, key) | Inverse(Sparkle(model, "", key)) | Octopus value for that key |
memory.set_members(memory.by_item_key("sets", "my_set"))
memory.sequence_member(memory.by_item_key("seqs", "my_seq"), pos=2)
memory.tentacle(memory.by_item_key("records", "person"), "name")
Reverse Attractors
Roughly reverse attractors try to locate composites given a part.
| Attractor | Modifier | Attracts |
|---|---|---|
| SetAttractor(member, candidate) | Sparkle(SET_MARKER @ candidate) | All Sets in candidate containing member |
| SequenceAttractor(member, pos, candidate) | Bind(SEQ_MARKER @ candidate, Step^pos) | All Sequences in candidate with member at pos |
| OctopusAttractor(key, value) | Sparkle(model, "", key) | Octopuses with that key/value pair |
memory.set_attractor(memory.by_item_key("animals", "cat"), "sets")
memory.sequence_attractor(memory.by_item_key("animals", "cat"), 0, "seqs")
memory.octopus_attractor("color", memory.by_item_key("colors", "red"))
Analogical Reasoning
AnalogicalReasoner(dst, src, feature) performs analogical reasoning (“A is to B as C is to ?”): for each chunk c yielded by dst, it computes Bind(c.code, feature, Inverse(src)) and forwards to NNS. Model is implicit in src / feature.
Given the analogy “king is to queen as man is to ?”:
king = hv.Sparkle(model, "role", "king")
queen = hv.Sparkle(model, "role", "queen")
man = hv.Sparkle(model, "role", "man")
# Analogy: "king is to queen as man is to ?"
# src = king (the known source of the relationship)
# feature = queen (the known feature/attribute of src)
# dst = man (the target; we want to find its corresponding feature)
#
# src = king (known source), feature = queen (known relation), dst = man.
# Modifier = queen ⊗ inverse(king); applied to man → "woman".
memory.nns(
memory.analogical_reasoner(memory.with_code(man), king, queen))
Direct WithCodeModifier / WithIDModifier
For ad-hoc patterns that don’t fit a named attractor, use the primitives directly. They take a precomputed HyperBinary modifier and apply Bind(code, modifier) or Bind(id, modifier) to each yielded chunk:
memory.with_code_modifier(inner_selector, modifier_vec)
memory.with_id_modifier(inner_selector, modifier_vec)
Other Selectors
ByItemKey
Exact lookup by domain + pod.
sel = memory.by_item_key("animals", "cat")
ByItemDomain
All chunks in a given domain (prefix scan).
sel = memory.by_item_domain("animals")
WithCode / WithSparkle
Literal selector — returns a hypervector directly, no storage lookup.
sel = memory.with_code(some_hv)
sel = memory.with_sparkle("animals", "cat")
Joiner
Union of multiple selectors — returns results from each of the inner selectors.
sel = memory.joiner(
memory.by_item_key("animals", "cat"),
memory.by_item_key("animals", "dog"),
)
Range
Limits results to [start, start+limit).
limit=0 (default) implies no limit, and iteration continue until there is no more results.
sel = memory.range_sel(
memory.by_item_domain("animals"), start=0, limit=10)
OnlyDomain
Filters inner selector results by given domain.
sel = memory.only_domain(
"animals", inner_selector)
Working with Results
FirstPicked — get the first match
Returns the first chunk matching the selector. Returns an error if nothing is found.
# Returns the first matching Chunk (with .id, .code, .note, .extra)
chunk = memory.first_picked(view, selector)
print(chunk.id, chunk.code, chunk.note)
mem_get — eager batch read (Chunks only)
Returns every match as a list[Chunk]. No extras — any per-result SelectorExtra produced by the selector (e.g. NNS scores) is discarded. Use this when you only need the Chunks.
chunks = storage.mem_get(selector) # list[Chunk]
for chunk in chunks:
print(chunk.id, chunk.note)
lazy_selector_iter — stream Chunks with extras
Yields (Chunk, Optional[SelectorExtra]) tuples one at a time. This is the only way to access per-result SelectorExtra in Python; mem_get drops it.
# Streaming — useful for large result sets or early termination
for chunk, extra in memory.lazy_selector_iter(view, selector):
print(chunk.id, extra)
if done():
break
# Eager with extras — wrap in list()
results = list(memory.lazy_selector_iter(view, selector))
# results: list[tuple[Chunk, Optional[SelectorExtra]]]
Producers
ChunkProducers are write builders that create and persist chunks in the substrate. Each producer encapsulates the logic for constructing a specific type of chunk.
ClusterUpdater) without creating new ones. In those cases, Produce returns the updated chunk rather than a newly created one.
Producer Options
Producer options are additional information supplied to producer constructor to tweak behavior.
# `note` indiciates additional note for the new terminal chunk.
memory.new_terminal("d", "p", note="annotation")
# `semantic_indexing` indicates we need to index the semantic code
# (on top of the id vector).
memory.from_set_members("d", "p", members, semantic_indexing=True)
Concrete Producers
NewTerminal
Creates a chunk whose code equals its identity (a bare Sparkle). Useful for registering atoms/symbols.
with storage.new_mutable_view() as view:
memory.mem_set(view, memory.new_terminal("fruits", "apple", note="an apple"))
NewLearner
Creates a fresh Learner chunk for online learning.
with storage.new_mutable_view() as view:
memory.mem_set(view, memory.new_learner("learners", "my_learner", note="a learner"))
FromSetMembers
Creates a Set from stored members.
with storage.new_mutable_view() as view:
memory.mem_set(
view,
memory.from_set_members(
"sets",
"fruit_set",
memory.by_item_domain("fruits"),
))
FromSequenceMembers
Creates a Sequence from stored members with positional encoding.
with storage.new_mutable_view() as view:
memory.mem_set(
view,
memory.from_sequence_members(
"seqs",
"greeting",
memory.joiner(
memory.by_item_key("words", "hello"),
memory.by_item_key("words", "world"),
),
start=0),
)
FromKeyValues
Creates an Octopus (key-value composite) from keys and value selectors.
with storage.new_mutable_view() as view:
memory.mem_set(
view,
memory.from_key_values(
"records",
"obj1",
keys=["color", "shape"],
values=memory.joiner(
memory.by_item_key("colors", "red"),
memory.by_item_key("shapes", "circle"),
),
))
FromSourceDest
Creates a Pointer 👉 chunk — a directional reference from a source chunk to a dest chunk. Both selectors must resolve to a single chunk; the produced Pointer’s bit-level value is source.id ⊗ Inv(dest.id).
with storage.new_mutable_view() as view:
memory.mem_set(
view,
memory.from_source_dest(
"edges", "earth_to_moon",
memory.by_item_key("planets", "earth"),
memory.by_item_key("planets", "moon"),
))
ClusterUpdater
Feeds an observed chunk into an existing Learner, updating its accumulated code via bundling. The bundle multiplier defaults to 1; pass an explicit override (multiple=N in Python) to fold the same observation in repeatedly.
with storage.new_mutable_view() as view:
# With explicit multiplier:
memory.mem_set(view,
memory.cluster_updater(
learner=memory.by_item_key("learners", "my_learner"),
observed=memory.by_item_key("fruits", "apple"),
multiple=3,
))
Train 🚂🚃🚃🚃
A doubly-linked, payload-carrying chunk data structure built on Octopus chunks. The train is a train of carriages — each carriage 🚃 carries one payload chunk, and carriages couple to their neighbors at LEFT ⬅️ / RIGHT ➡️.
Shape
A train occupies its own 64-bit train_domain. The structure has two kinds of chunks:
- Sentinel locomotive 🚂 — the chunk at
(train_domain, P0). Anchors both ends of the train. ItsLEFTslot points at the leftmost carriage, itsRIGHTslot at the rightmost. ItsPAYLOADslot is identity (no payload). - Carriage 🚃 — a chunk at
(train_domain, carriage_pod)for some non-zero pod. Each carriage is a specialized Octopus with four slots:PAYLOAD📨 — points at the actual payload chunk in item memory (Sparkle(payload_domain, payload_pod)). Always non-identity.LEFT⬅️ — id-Sparkle of the previous carriage, or identity if leftmost.RIGHT➡️ — id-Sparkle of the next carriage, or identity if rightmost.CHILDREN👨👩👧👦 — optional pointer at a child structure (typically the sentinel of a child train). Identity when no children are attached.
A 3-carriage train (payloads A, B, C in left-to-right order):
🚂 🚃 🚃 🚃
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ sentinel │ <--> │ A │ <--> │ B │ <--> │ C │
│ pod=P0 │ │ pod=p_A │ │ pod=p_B │ │ pod=p_C │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
Each box is one Octopus chunk. Each <--> between adjacent boxes represents one bidirectional LEFT/RIGHT pointer pair. The slot values inside each chunk are:
| Chunk | PAYLOAD | LEFT | RIGHT | CHILDREN |
|---|---|---|---|---|
sentinel 🚂 (pod=P0) | identity ⊥ | leftmost (A) | rightmost (C) | identity ⊥ (or a child structure) |
| carriage A | Sparkle(A*) | identity ⊥ | carriage B | identity ⊥ (or a child structure) |
| carriage B | Sparkle(B*) | carriage A | carriage C | identity ⊥ (or a child structure) |
| carriage C | Sparkle(C*) | carriage B | identity ⊥ | identity ⊥ (or a child structure) |
Where A*, B*, C* are the actual payload chunks living elsewhere in item memory.
The sentinel’s LEFT/RIGHT are anchor pointers (jumps directly to the leftmost / rightmost), not neighbor pointers. The carriages’ LEFT/RIGHT are conventional neighbor pointers.
Pushing
storage.mem_set(
memory.train_appender(
train_domain, memory.with_sparkle(payload_domain, "A")))
storage.mem_set(
memory.train_prepender(
train_domain, memory.with_sparkle(payload_domain, "Z")))
Each push rewrites up to three chunks atomically through one mutable view:
- The new carriage at
(train_domain, fresh_pod). - The previous tail/head carriage, with its tail-side neighbor link updated to the new carriage. (Skipped on the first push, when the train is empty.)
- The sentinel, with its tail-side anchor pointer updated to the new carriage. The head-side pointer is updated only when the train transitions from empty to one element.
The sentinel is auto-created on the first push — no separate “init train” step is needed.
⚠️ Single-writer per train. Concurrent pushes from independent mutable views on the same train WILL race on the sentinel and on the previous tail/head — last-write wins, with dropped pushes possible. Callers expecting concurrent appenders must synchronize externally (e.g., serialize through a single goroutine/task, or take an external lock keyed on
train_domain). Reads through aVieware always snapshot-consistent per chunk.
Mid-train insertion: TrainInsertAfter / TrainInsertBefore
TrainAppender and TrainPrepender are thin wrappers over the more general TrainInsertAfter / TrainInsertBefore, which take a reference carriage selector instead of a domain. The new carriage is wedged on the chosen side of the reference. The train domain is taken from the resolved reference carriage.
| Reference carriage | TrainInsertAfter | TrainInsertBefore |
|---|---|---|
sentinel 🚂 (TrainCarriage(domain)) | append (new rightmost) | prepend (new leftmost) |
| member carriage X | wedge between X and X.right | wedge between X.left and X |
Sentinel-as-reference is the wraparound bookend case: since the sentinel sits on both ends of the train, “after the sentinel” means “right of the rightmost,” and “before the sentinel” means “left of the leftmost.” On an empty train, either becomes the first and only member.
Each insert rewrites:
- The new carriage.
- The carriage on each side whose neighbor link points at the new one (one rewrite for end-inserts, two for mid-train inserts).
- The sentinel, only when its anchor pointer changes (i.e., the new carriage becomes the leftmost or rightmost).
# Mid-train: wedge M between B and C.
storage.mem_set(memory.train_insert_after(
memory.train_carriage(train_domain, b_pod),
memory.with_sparkle(payload_domain, "M"),
))
# Sentinel-as-ref: equivalent to train_appender.
storage.mem_set(memory.train_insert_after(
memory.train_carriage(train_domain),
memory.with_sparkle(payload_domain, "tail"),
))
Empty check: TrainIsEmpty
A direct query that reads the sentinel and reports whether both LEFT and RIGHT anchors are identity. A train whose sentinel hasn’t been written yet (never touched) is also reported empty — no auto-init.
with storage.new_view() as view:
if memory.train_is_empty(view, train_domain):
print("nothing to process")
Iterating — cursor-straddle semantics
TrainForward / TrainBackward walk the train, yielding each carriage’s payload chunk. The starting point is itself a Selector — pass TrainCarriage(domain) (with no pod) to start from the sentinel for a full-train iteration, or TrainCarriage(domain, pod) to start from a specific carriage.
The cursor straddles between elements, similar to Java’s ListIterator or C++’s std::list::iterator:
startis the cursor; iteration is exclusive ofstart’s own payload.- When
startresolves to the sentinel, iteration covers the whole train (sentinel acts as both before-leftmost and after-rightmost). - When
startresolves to a member carriage, iteration yields the elements strictly after it in the chosen direction; the start carriage’s own payload is NOT yielded.
start | TrainForward yields | TrainBackward yields |
|---|---|---|
| sentinel | leftmost, …, rightmost | rightmost, …, leftmost |
| carriage X | X.right, X.right.right, … | X.left, X.left.left, … |
# Whole-train iteration from the sentinel.
for chunk in storage.mem_get(
memory.train_forward(memory.train_carriage(train_domain))):
print(chunk.note)
# From a specific carriage onward (exclusive of that carriage).
for chunk in storage.mem_get(
memory.train_backward(memory.train_carriage(train_domain, carriage_pod))):
print(chunk.note)
Single-step shorthands: TrainNext / TrainPrev
TrainNext(start) is shorthand for Range(0, 1, TrainForward(start)) — the next single payload from the cursor’s position. TrainPrev(start) is the backward mirror. They follow the same exclusive-start rule.
| Cursor | TrainNext yields | TrainPrev yields |
|---|---|---|
| sentinel | leftmost carriage’s payload | rightmost carriage’s payload |
| carriage X | X’s right neighbor’s payload (or empty if X is rightmost) | X’s left neighbor’s payload (or empty if X is leftmost) |
# Peek at the front / back of the train.
front = next(iter(storage.mem_get(
memory.train_next(memory.train_carriage(train_domain))))).note
back = next(iter(storage.mem_get(
memory.train_prev(memory.train_carriage(train_domain))))).note
Iteration rule (implementation)
Each step:
- Integrity — sentinel is identified by
pod == P0. The sentinel MUST have an identityPAYLOAD; a carriage MUST have a non-identityPAYLOAD. Either violation returnsFailedPrecondition(the train is malformed). - Advance — on a carriage, follow the matching-direction key (
RIGHTfor forward,LEFTfor backward) to the next neighbor. On the sentinel, follow the opposite-direction key — sentinel’sLEFT/RIGHTare anchor pointers, so the opposite key takes you to the appropriate end. - Yield — yield the chunk you advanced to. Stop when the next pointer is identity.
The advance-then-yield order is what makes the iteration exclusive of start.
Hierarchy: the CHILDREN slot
Each carriage’s CHILDREN 👨👩👧👦 slot can point at the sentinel of another train (or any other chunk). Attach one at push time via the children argument; query it via the TrainChildren(parent) selector.
# Push a parent carriage whose CHILDREN points at a child train's sentinel.
storage.mem_set(memory.train_appender(
parent_domain,
memory.with_sparkle(payload_domain, "B"),
children=memory.train_carriage(child_domain),
))
# Walk the child train from the parent.
for chunk in storage.mem_get(
memory.train_forward(
memory.train_children(
memory.train_carriage(parent_domain, parent_carriage_pod)))):
print(chunk.note)
TrainChildren(parent) yields zero chunks (no error) when the parent has identity CHILDREN or is missing entirely. Hierarchical traversal is the caller’s job — walk a parent train, recurse into TrainChildren on each carriage.
Notebook Quick Start
This guide walks through using Kongming HV in a Jupyter notebook, cell by cell.
| Section | Description |
|---|---|
| Notebook Platforms | Setup differences between Jupyter, Colab, and Binder |
| Interactive Notebooks | Links to existing notebooks |
| Walkthrough | Step-by-step: vocabulary, similarity, learning, binding |
Tips
- Reproducibility: Use fixed seeds in
SparseOperationfor deterministic results across reruns. - Visualization: Use
pandasDataFrames for overlap matrices — they render nicely in Jupyter. - Performance: The Rust backend is fast. Building 10,000 vectors takes under a second on
MODEL_64K_8BIT. - Model choice: Start with
MODEL_64K_8BITfor exploration. Switch toMODEL_1M_10BITor larger for production workloads.
Notebook Platforms
Setup and behavior differ across Jupyter, Google Colab, and Binder. This page covers the key differences.
Try Online
| Notebook | Platform | Link |
|---|---|---|
first.ipynb | Google Colab | |
first.ipynb | Binder | |
memory.ipynb | Google Colab | |
lisp.ipynb | Google Colab |
Comparison
| Jupyter (local) | Google Colab | Binder | |
|---|---|---|---|
| Account | None | Google account required | None |
| Install | pip install in terminal beforehand | !pip install in first cell | Pre-installed via requirements.txt |
| Restart needed | No | Yes — after first install | No |
| Startup time | Instant | Fast (~5s) | Slow (2–5 min cold start) |
| Persistence | Local filesystem | Google Drive (optional mount) | Ephemeral — lost on timeout |
| GPU | If available locally | Free tier available | Not available |
| Custom packages | Full control | !pip install per session | Via requirements.txt only |
Jupyter (Local)
Install once in your terminal, then use in any notebook:
pip install kongming-rs-hv
# Cell 1 — no restart needed
from kongming import hv
For development workflows with frequent code changes, use autoreload:
%load_ext autoreload
%autoreload 2
Google Colab
Colab runs in the cloud with a fresh environment each session. Install in the first cell:
# Cell 1 — install
!pip install kongming-rs-hv
After the first install, Colab requires a runtime restart:
- Go to Runtime → Restart runtime (or use the button Colab shows after install)
- Then run the remaining cells
# Cell 2 — after restart
from kongming import hv
model = hv.MODEL_64K_8BIT
Subsequent sessions on the same notebook will need the install cell again — Colab does not persist pip packages across sessions.
Saving work: Use google.colab.drive to mount Google Drive for persistent storage:
from google.colab import drive
drive.mount('/content/drive')
# Then use paths like /content/drive/MyDrive/...
Binder
Binder builds a Docker image from your repo’s requirements.txt and launches a Jupyter server. No account needed.
- First launch: Takes 2–5 minutes to build the environment
- Subsequent launches: Faster if the image is cached
- No install needed:
kongming-rs-hvis pre-installed fromrequirements.txt - Ephemeral: All work is lost when the session times out (~10 min idle)
# Cell 1 — works immediately, no install
from kongming import hv
requirements.txt (the environment is read-only).
Choosing a Platform
| Use case | Recommended |
|---|---|
| Daily development | Jupyter (local) |
| Quick demo / sharing | Google Colab |
| Zero-setup exploration | Binder |
| Teaching / workshops | Google Colab (students have accounts) |
| Persistent storage needed | Jupyter (local) or Colab + Drive |
Interactive Notebooks
For deeper walkthroughs, open these notebooks directly:
| Notebook | Description | Colab |
|---|---|---|
first.ipynb | Introduction to hypervectors, bind/bundle operations, and composites | |
memory.ipynb | In-memory and persistent storage, near-neighbor search with attractors, and export to disk | |
lisp.ipynb | VSA-based LISP interpreter where every data structure is a hypervector |
See also: LISP Interpreter — a full example built on the core API.
Walkthrough
A step-by-step introduction to Kongming HV in a notebook, cell by cell.
Setup
# Cell 1: Install and import
# !pip install kongming-rs-hv pandas
from kongming import hv
import pandas as pd
model = hv.MODEL_64K_8BIT
so = hv.SparseOperation(model, 0, 1)
Building a Vocabulary
# Cell 2: Create vectors for a set of words
words = ["cat", "dog", "fish", "bird", "tree", "rock"]
vectors = {w: hv.Sparkle.from_word(model, "vocab", w) for w in words}
print(f"Created {len(vectors)} vectors")
print(f"Model: {model}, Cardinality: {hv.cardinality(model)}")
Output:
Created 6 vectors
Model: 1, Cardinality: 256
Similarity Matrix
# Cell 3: Compute pairwise overlap
data = {}
for w1 in words:
data[w1] = {w2: hv.overlap(vectors[w1], vectors[w2]) for w2 in words}
pd.DataFrame(data, index=words)
Output:
| cat | dog | fish | bird | tree | rock | |
|---|---|---|---|---|---|---|
| cat | 256 | 1 | 0 | 2 | 1 | 1 |
| dog | 1 | 256 | 1 | 0 | 1 | 2 |
| fish | 0 | 1 | 256 | 1 | 0 | 1 |
| bird | 2 | 0 | 1 | 256 | 1 | 0 |
| tree | 1 | 1 | 0 | 1 | 256 | 1 |
| rock | 1 | 2 | 1 | 0 | 1 | 256 |
The diagonal is 256 (cardinality = perfect self-overlap). Off-diagonal values are near 0-2 (random noise), confirming the vectors are near-orthogonal.
Learning from Observations
# Cell 4: Create a learner and feed it observations
learner = hv.Learner(model, hv.Seed128(0, so.uint64()))
# "cat" seen 3 times, "dog" once, "bird" once
for _ in range(3):
learner.bundle(vectors["cat"])
learner.bundle(vectors["dog"])
learner.bundle(vectors["bird"])
print(f"Learner age: {learner.age()}")
Output:
Learner age: 5
Probing the Learner
# Cell 5: Check what the learner remembers
results = []
for w in words:
ov = hv.overlap(learner, vectors[w])
results.append({"word": w, "overlap": ov})
df = pd.DataFrame(results).sort_values("overlap", ascending=False)
df
Output:
| word | overlap |
|---|---|
| cat | ~75 |
| dog | ~30 |
| bird | ~30 |
| fish | ~1 |
| tree | ~1 |
| rock | ~1 |
“cat” has the highest overlap (seen 3x). “dog” and “bird” (seen 1x each) have moderate overlap. Unseen words are at noise level (~1).
Binding: Role-Filler Pairs
# Cell 6: Create a structured representation
# "a cat that is red"
color_role = hv.Sparkle.from_word(model, "role", "color")
animal_role = hv.Sparkle.from_word(model, "role", "animal")
red = hv.Sparkle.from_word(model, "color", "red")
blue = hv.Sparkle.from_word(model, "color", "blue")
cat = vectors["cat"]
# Bind role with filler, then bundle the pairs
learner2 = hv.Learner(model, hv.Seed128(0, so.uint64()))
learner2.bundle(hv.Sparkle.bind(color_role, red))
learner2.bundle(hv.Sparkle.bind(animal_role, cat))
# Probe: "what color?"
query = hv.Sparkle.bind(learner2, color_role.power(-1))
print(f"red overlap: {hv.overlap(query, red)}") # high
print(f"blue overlap: {hv.overlap(query, blue)}") # ~1
print(f"cat overlap: {hv.overlap(query, cat)}") # ~1
Python Quick Start
| Section | Description |
|---|---|
| Installation | PyPI install, supported platforms, import paths |
| Quick Example | Minimal code showing bind, bundle, and overlap |
| Walkthrough | Vectors, similarity, random generation, power, learning |
See also: Notebook Quick Start for interactive Jupyter walkthroughs.
Installation
PyPI
pip install kongming-rs-hv
Supported Platforms
| Platform | Architectures | Python Versions |
|---|---|---|
| Linux | x86_64 | 3.10–3.14 |
| macOS | Apple Silicon & Intel | 3.10–3.14 |
| Windows | x86_64 | 3.10–3.14 |
Verifying Installation
import kongming
print(kongming.__version__) # e.g. should be "3.6.5", as of Apr. 2026. Yours should be newer.
from kongming import hv
print(hv.MODEL_64K_8BIT) # should print 1
Import Paths
The package exposes two main modules:
from kongming import hv # hypervector operations
from kongming import memory # storage and selectors
Model constants are available directly on hv:
hv.MODEL_64K_8BIT # 1
hv.MODEL_1M_10BIT # 2
hv.MODEL_16M_12BIT # 3
hv.MODEL_256M_14BIT # 4
hv.MODEL_4G_16BIT # 5
Docker
If you’d rather not install anything on your host, you can run kongming-rs-hv
inside a container. This works on any system with Docker — no Python, no
virtualenv, no wheel compatibility to worry about.
One-liner: throwaway Python REPL
Drop straight into a Python shell with the package preinstalled:
docker run --rm -it python:3.12-slim sh -c "\
pip install --quiet --root-user-action=ignore \
--disable-pip-version-check kongming-rs-hv && python"
--rm removes the container on exit. Nothing is persisted. Re-running reinstalls
from PyPI, which takes a few seconds. The --root-user-action=ignore and
--disable-pip-version-check flags silence pip’s root-user and upgrade notices,
which are harmless inside a throwaway container.
Reusable image
For repeat use, build a small image once:
# Dockerfile
FROM python:3.12-slim
RUN pip install --no-cache-dir --disable-pip-version-check kongming-rs-hv
CMD ["python"]
docker build -t kongming-hv .
docker run --rm -it kongming-hv
To run a script from the host instead of an interactive REPL, mount the current directory:
docker run --rm -v "$PWD":/work -w /work kongming-hv python my_script.py
JupyterLab in a container
For interactive exploration with notebooks:
# Dockerfile.jupyter
FROM python:3.12-slim
RUN pip install --no-cache-dir --disable-pip-version-check \
kongming-rs-hv jupyterlab
WORKDIR /notebooks
EXPOSE 8888
CMD ["jupyter", "lab", "--ip=0.0.0.0", "--no-browser", \
"--ServerApp.token=''", "--ServerApp.password=''"]
docker build -f Dockerfile.jupyter -t kongming-hv-jupyter .
docker run --rm -p 8888:8888 -v "$PWD":/notebooks kongming-hv-jupyter
Open http://localhost:8888 in your browser. Notebooks saved under /notebooks
are persisted to the mounted host directory.
The disabled token/password above is fine for local use. Do not expose this container on a public network without adding authentication.
Quick Example
A minimal example showing the core operations:
from kongming import hv
# Create hypervectors
a = hv.Sparkle.from_word(hv.MODEL_64K_8BIT, hv.d0(), "hello")
b = hv.Sparkle.from_word(hv.MODEL_64K_8BIT, hv.d0(), "world")
print(f'Overlap: {hv.overlap(a, b)}') # Near orthogonal (~1)
# Bind: result is dissimilar to both inputs
bound = hv.bind(a, b)
print(f'{hv.overlap(bound, a)=}, {hv.overlap(bound, b)=}') # ~1, ~1
# Bundle: result is similar to both inputs
bundled = hv.bundle(hv.Seed128(10, 1), a, b)
print(f'{hv.overlap(bundled, a)=}, {hv.overlap(bundled, b)=}') # high, high
What’s Happening
Sparkle.from_wordgenerates a deterministic hypervector from a word. Same word always produces the same vector.- Two unrelated vectors have near-zero overlap (~1) — random high-dimensional vectors are nearly orthogonal.
hv.bind(a, b)produces a vector dissimilar to both (low overlap). Binding is reversible.hv.bundle(seed, a, b)produces a vector similar to both (high overlap). Different seeds produce different but equally valid results.
Walkthrough
A deeper exploration of the Python API, covering vector creation, similarity, random generation, power/permutation, and online learning.
Creating Vectors
from kongming import hv
model = hv.MODEL_64K_8BIT
# Create sparkles (atomic vectors) from words
cat = hv.Sparkle.from_word(model, "animals", "cat")
dog = hv.Sparkle.from_word(model, "animals", "dog")
# Same inputs always produce the same vector
cat2 = hv.Sparkle.from_word(model, "animals", "cat")
assert cat.stable_hash() == cat2.stable_hash()
Measuring Similarity
# Random vectors have ~1 overlap
print(hv.overlap(cat, dog)) # ≈ 1 (near-orthogonal)
# A vector is maximally similar to itself
print(hv.overlap(cat, cat)) # = 256 (= cardinality)
Using SparseOperation for Random Generation
so = hv.SparseOperation(model, 123, 456)
# Generate random sparkles
a = hv.Sparkle.random("my_domain", so)
b = hv.Sparkle.random("my_domain", so)
# Each call to so produces a new random seed
print(hv.overlap(a, b)) # ≈ 1
Power and Permutation
# Power creates a permuted vector
s = hv.Sparkle.from_word(model, "pos", "step")
s2 = s.power(2)
s3 = s.power(3)
# Different powers are near-orthogonal
print(hv.overlap(s, s2)) # ≈ 1
print(hv.overlap(s, s3)) # ≈ 1
# Inverse: power(-1) undoes power(1)
s_inv = s.power(-1)
# bind(s, s_inv) ≈ identity
Online Learning with Learner
learner = hv.Learner(model, hv.Seed128(0, 42))
# Feed observations one at a time
learner.bundle(cat)
learner.bundle(cat) # seen twice — stronger signal
learner.bundle(dog)
# The learned vector is more similar to cat (seen 2x)
print(hv.overlap(learner, cat)) # higher
print(hv.overlap(learner, dog)) # lower but above random
Examples
Standalone runnable scripts under examples/ — each demonstrates a different facet of hypervector computing. Click through for the walkthrough.
| Example | What it shows |
|---|---|
| Mexican Dollar | Analogical reasoning of “What’s the Dollar of Mexico?”: bind/bundle as the math behind analogy. |
| Word Indexer | Encoding and novel queries for 5,000 English words. |
| Bulk Storage Benchmark | Populate various substrates with thousands of chunks and measure retrieval performance. |
| Operators from Scratch | Reimplement bind and bundle in pure Python — the core math underneath the library. |
| LISP Interpreter | A full LISP where every atom, cons cell, and environment is a hypervector. For the VSA-curious. |
Operators from Scratch
Standalone script:
operators.py
This example implements the bind, release, and bundle operators in pure Python using only the low-level offset API, then verifies correctness against the library’s built-in implementations.
The script does not call hv.bind(), hv.release(), or hv.bundle() for computation — it reimplements them to show how they work at the offset level.
Bind
Per-segment offset addition modulo segment size:
for seg in range(cardinality):
result[seg] = (core_a.offset(seg) + core_b.offset(seg)) % segment_size
Properties:
- Result is nearly orthogonal to both inputs (overlap ≈ 1)
- Commutative:
bind(a, b) == bind(b, a) - Associative:
bind(a, b, c) == bind(bind(a, b), c)
Release (Unbind)
Per-segment offset subtraction modulo segment size:
for seg in range(cardinality):
result[seg] = (core_c.offset(seg) - core_k.offset(seg)) % segment_size
Properties:
release(bind(a, b), b) = a(exact recovery)- Multi-release:
release(release(bind(a, b, c), c), b) = a
Bundle
PRNG-based random selection among inputs. For each segment, a seeded PRNG picks which input vector contributes its offset. The selection probability is proportional to each input’s weight.
# Compute cumulative anchors from weights (weights sum to 1.0).
# For equal weights [0.33, 0.33, 0.33]: anchors ≈ [21845, 43690, 65535]
# For weighted [0.6, 0.2, 0.2]: anchors ≈ [39321, 52428, 65535]
cumulative = 0.0
anchors = []
for w in weights:
cumulative += w
anchors.append(int(cumulative * 65535))
for seg in range(0, cardinality, 4):
r = so.uint64() # one PRNG call → 4 × 16-bit values
for j in range(4):
dial = (r >> (48 - 16 * j)) & 0xFFFF # extract 16-bit random value
chosen = first input whose anchor >= dial
result[seg + j] = cores[chosen].offset(seg + j)
Properties:
- Result is similar to all inputs (overlap ≈ weight × cardinality)
- Not reversible — information is lost
Note: The library supports two bundling strategies: classic (shown above) and fisher_yates (default). To verify exact match with the pure Python implementation, set:
KONGMING_LEARNER_SAMPLING=classic python operators.py
Running
pip install kongming-rs-hv
# Classic sampling — all operators match exactly
KONGMING_LEARNER_SAMPLING=classic python operators.py
Mexican Dollar
Standalone scripts:
mexican_dollar.py|mexican_dollar_memory.py
The “What’s the Dollar of Mexico?” problem is a classic demonstration of analogical reasoning with hypervectors. It shows how structured knowledge about countries can be encoded, and how algebraic operations can answer analogy questions without explicit programming.
The Problem
Given knowledge about three countries:
| Country | Code | Capital | Currency |
|---|---|---|---|
| USA | USA | Washington DC | Dollar |
| Mexico | MEX | Mexico City | Peso |
| Sweden | SWE | Stockholm | Krona |
We want to answer questions like:
- “What is the Dollar of Mexico?” → Peso
- “What is the Washington DC of Mexico?” → Mexico City
- “What is the Dollar of Sweden?” → Krona
How It Works
Each country is encoded as a bundled set of role-filler bindings:
To find “the Dollar of Mexico”, we compute a transfer vector from US to Mexico:
Then apply it to Dollar:
The result will have high overlap with Peso — the analogical answer.
The same transfer works for Sweden:
Code (Manual)
The algebraic approach — compute the transfer vector directly:
from kongming import hv
model = hv.MODEL_64K_8BIT
so = hv.SparseOperation(model, "knowledge", 0)
# Create role markers
country_code = hv.Sparkle.from_word(model, "role", "country_code")
capital = hv.Sparkle.from_word(model, "role", "capital")
currency = hv.Sparkle.from_word(model, "role", "currency")
# Create fillers
usa = hv.Sparkle.from_word(model, "country", "usa")
mex = hv.Sparkle.from_word(model, "country", "mex")
swe = hv.Sparkle.from_word(model, "country", "swe")
dc = hv.Sparkle.from_word(model, "capital", "dc")
mexico_city = hv.Sparkle.from_word(model, "capital", "mexico_city")
stockholm = hv.Sparkle.from_word(model, "capital", "stockholm")
dollar = hv.Sparkle.from_word(model, "currency", "dollar")
peso = hv.Sparkle.from_word(model, "currency", "peso")
krona = hv.Sparkle.from_word(model, "currency", "krona")
# Encode each country as role-filler bundles
us_record = hv.bundle(hv.Seed128.random(so),
hv.bind(country_code, usa),
hv.bind(capital, dc),
hv.bind(currency, dollar),
)
mexico_record = hv.bundle(hv.Seed128.random(so),
hv.bind(country_code, mex),
hv.bind(capital, mexico_city),
hv.bind(currency, peso),
)
sweden_record = hv.bundle(hv.Seed128.random(so),
hv.bind(country_code, swe),
hv.bind(capital, stockholm),
hv.bind(currency, krona),
)
# Transfer vector: Mexico / US
transfer_to_mexico = hv.release(mexico_record, us_record)
# "What's the Dollar of Mexico?"
mexican_dollar = hv.bind(dollar, transfer_to_mexico)
print(f"peso overlap: {hv.overlap(mexican_dollar, peso)}") # high!
print(f"dollar overlap: {hv.overlap(mexican_dollar, dollar)}") # ~1 (noise)
print(f"krona overlap: {hv.overlap(mexican_dollar, krona)}") # ~1 (noise)
# "What's the Washington DC of Mexico?"
mexican_dc = hv.bind(dc, transfer_to_mexico)
print(f"mexico_city overlap: {hv.overlap(mexican_dc, mexico_city)}") # high!
# Transfer to Sweden works the same way
transfer_to_sweden = hv.release(sweden_record, us_record)
swedish_dollar = hv.bind(dollar, transfer_to_sweden)
print(f"krona overlap: {hv.overlap(swedish_dollar, krona)}") # high!
Code (with AnalogicalReasoner)
When records are stored in memory (as Octopus composites), analogical_reasoner handles the transfer:
from kongming import hv, memory
model = hv.MODEL_64K_8BIT
store = memory.InMemory(model)
keys = ["capital", "currency", "country_code"]
# Store individual fillers — NNS needs them as searchable items
fillers = {}
for word in ["dc", "USD", "USA", "mexicoCity", "MXN", "MEX",
"stockholm", "SEK", "SWE"]:
s = hv.Sparkle.from_word(model, 0, word)
store.put(s)
fillers[word] = s
# Store country records as Octopus composites
store.put(hv.Octopus(
hv.Seed128("country", "USA"), keys,
fillers["dc"], fillers["USD"], fillers["USA"],
))
store.put(hv.Octopus(
hv.Seed128("country", "MEX"), keys,
fillers["mexicoCity"], fillers["MXN"], fillers["MEX"],
))
store.put(hv.Octopus(
hv.Seed128("country", "SWE"), keys,
fillers["stockholm"], fillers["SEK"], fillers["SWE"],
))
# Retrieve stored records
us_code = store.get("country", "USA").code
mex_code = store.get("country", "MEX").code
swe_code = store.get("country", "SWE").code
view = store.new_view()
# "What is the USD of Mexico?"
result = memory.first_picked(view,
memory.nns(
memory.analogical_reasoner(
memory.with_code(mex_code),
us_code,
fillers["USD"],
)
)
)
print(result.id) # → ✨:🌱MXN
# "What is the Washington DC of Mexico?"
result = memory.first_picked(view,
memory.nns(
memory.analogical_reasoner(
memory.with_code(mex_code),
us_code,
fillers["dc"],
)
)
)
print(result.id) # → ✨:🌱mexicoCity
# "What is the Dollar of Sweden?"
result = memory.first_picked(view,
memory.nns(
memory.analogical_reasoner(
memory.with_code(swe_code),
us_code,
fillers["USD"],
)
)
)
print(result.id) # → ✨:🌱SEK
analogical_reasoner computes the transfer vector feature ⊗ inverse(src) internally and uses near-neighbor search to find the best match in memory — no manual algebra needed.
Why It Works
The transfer vector captures the structural mapping between the two records. When applied to any filler from the US record, it maps it to the corresponding filler in the Mexico record — because the role-filler binding structure is preserved by the algebra.
This is a form of analogical reasoning: no explicit rules, no lookup tables — just algebraic operations on high-dimensional vectors.
See Also
- Concepts: Operators — algebraic foundations
- Operators — bind, release, bundle
- Octopus — key-value composite used for country records
- Memory: Selectors —
analogical_reasoner,nns,with_code - Near-neighbor search — how the reasoner finds answers
Bulk Storage Benchmark
Standalone script:
bulk_storage.py
This example populates a storage with a large number of random terminal chunks, then queries a few by key to verify correctness. It demonstrates how to batch-create items and measure throughput.
Note associative index is also prepared in the process, and near-neighbor search is available immediately upon successful conclusion of all writing.
Motivated readers can further improve this script to test various producers or selectors.
Script
#!/usr/bin/env python3
"""Populate local storage with random terminal chunks and verify retrieval."""
import argparse
import shutil
import tempfile
import time
from kongming import hv, memory
def main():
parser = argparse.ArgumentParser(description="Bulk storage benchmark")
parser.add_argument(
"-n", "--count", type=int, default=10_000,
help="Number of terminal chunks to create (default: 10000)",
)
parser.add_argument(
"--model", type=int, default=hv.MODEL_1M_10BIT,
help="HV model (default: MODEL_1M_10BIT)",
)
parser.add_argument(
"--domain", type=str, default="bench",
help="Domain name for all chunks (default: bench)",
)
parser.add_argument(
"--backend", type=str,
choices=["inmemory", "embedded"],
default="inmemory",
help="Storage backend (default: inmemory)",
)
parser.add_argument(
"--path", type=str, default=None,
help="Disk path for embedded backend (default: temp directory)",
)
args = parser.parse_args()
# --- Create storage ---
tmpdir = None
if args.backend == "embedded":
if args.path:
path = args.path
else:
tmpdir = tempfile.mkdtemp()
path = f"{tmpdir}/bench_store"
storage = memory.Embedded(args.model, path)
print(f"Backend: Embedded (path={path})")
else:
storage = memory.InMemory(args.model)
print("Backend: InMemory (BTreeMap, pure in-memory)")
# --- Write phase ---
print(f"Writing {args.count:,} terminal chunks …")
t0 = time.perf_counter()
for i in range(args.count):
storage.mem_set(memory.new_terminal(args.domain, str(i)))
elapsed = time.perf_counter() - t0
rate = args.count / elapsed
print(f" done in {elapsed:.2f}s ({rate:,.0f} chunks/s)")
print(f" item_count = {storage.item_count():,}")
# --- Read phase: spot-check a few items ---
spot_checks = range(0, args.count, args.count // 100)
print(f"Spot-checking keys: {spot_checks}")
for idx in spot_checks:
expected = hv.Sparkle.from_word(args.model, args.domain, str(idx))
chunk = storage.get(args.domain, str(idx))
if not hv.equal(chunk.id, expected):
print(f"mismatch at key {idx}: {chunk}")
print("All checks passed.")
# --- Cleanup ---
if tmpdir:
del storage
shutil.rmtree(tmpdir)
if __name__ == "__main__":
main()
Usage
# Default: 10K chunks, in-memory storage substrate.
python bulk_storage.py
# Embedded (disk-backed storage substrate).
python bulk_storage.py --backend embedded
# Embedded with a specific path (tip: use a tmpfs mount for near-in-memory speed)
python bulk_storage.py --backend embedded --path /dev/shm/my_bench
# Custom count
python bulk_storage.py -n 100000
# Different model, 1 implies MODEL_64K_8BIT model, etc.
python bulk_storage.py -n 10000 --model 1
Word Indexer
Standalone script:
word_indexer.py
This example encodes ~5,000 English words as Sequences of per-letter Sparkles, then queries them by exact word or by positional suffix (“six-letter words ending in er”, “eleven-letter words ending in tion”) using multi-attractor near-neighbor search.
It demonstrates four ideas together:
- Using
Sparkleas a stable per-symbol code (one Sparkle pera–z). - Using
Sequencewith aPod-derived seed so chunks are addressable both by word (exact) and by structure (positional). - The ChunkProducer API (
new_terminal,from_sequence_members,joiner) staged through a batchedSubstrateMutableViewviaproducer.produce(view). - Multi-attractor
nnsoverSequenceAttractorfor positional conjunctive queries.
The general idea
letters domain words domain
───────────── ────────────
"a" → Sparkle_a "the" → Sequence(t, h, e)
"b" → Sparkle_b "language" → Sequence(l, a, n, g, u, a, g, e)
"c" → Sparkle_c ...
... Pod = word (exact lookup key)
"z" → Sparkle_z note = word (recoverable in results)
members = letter Sparkles in order
- Letters as Sparkles. Pre-write 26 random-looking
Sparkles, one pera–z, into alettersdomain vianew_terminal(letters, ch). Each letter’s Pod is the letter itself, so you can fetch it byby_item_key("letters", "e"). - Words as Sequences. Each word is a
Sequencein awordsdomain whose ordered members are the letter-Sparkles spelling it, built byfrom_sequence_members(...)with ajoiner(...)of per-letterby_item_keyselectors. The Sequence’s Pod is the word, so exact lookup isby_item_key("words", "language"). notecarries the word string. Each word-chunk is written withnote=<word>, sochunk.noterecovers the word in result loops without decoding the Pod.
Batched writes via the ChunkProducer API
This example uses the producer API end-to-end. Producers compute their
chunks at produce() time against a mutable view, mirroring Go’s
producer.Produce(ctx, view) and Rust’s producer.produce(view, index).
Storage’s new_mutable_view() is a context manager with transactional
semantics:
- All writes staged by
producer.produce(view)calls between__enter__and__exit__go into a single batch. - Clean exit auto-commits; an exception inside the block discards everything.
view.commit()mid-block flushes the current batch and lets you continue staging — useful for pacing memory pressure on large ingests.
Letters and words go into two separate views; the second commits every
BATCH_SIZE = 1000 words:
with storage.new_mutable_view() as view:
for ch in "abcdefghijklmnopqrstuvwxyz":
memory.new_terminal("letters", ch).produce(view)
# auto-commits on __exit__
with storage.new_mutable_view() as view:
for i, w in enumerate(words, start=1):
members = memory.joiner(*[memory.by_item_key("letters", ch) for ch in w])
# semantic_indexing=True: index the Sequence's code so suffix
# queries (sequence_attractor) can find words by structure.
memory.from_sequence_members(
"words", w, members, note=w, semantic_indexing=True,
).produce(view)
if i % BATCH_SIZE == 0:
view.commit()
# trailing writes auto-commit on __exit__
See Substrate & Views for the full view API.
Multi-attractor NNS
A sequence_attractor(member_selector, pos, domain) is a positional constraint: “Sequences in domain whose member at pos overlaps with member_selector”. Position is 0-based.
nns(*attractors) evaluates all attractors and ranks Sequences by combined overlap. With multiple attractors, the result is a conjunction — a chunk must satisfy each positional constraint to score well.
For “six-letter words ending in er”:
memory.nns(
memory.sequence_attractor(memory.by_item_key("letters", "e"), 4, WORDS_DOMAIN),
memory.sequence_attractor(memory.by_item_key("letters", "r"), 5, WORDS_DOMAIN),
)
This returns Sequences with e at index 4 and r at index 5 — i.e., the last two characters of a six-letter word.
For “eleven-letter words ending in tion”, anchor t/i/o/n at positions 7/8/9/10.
Counting and ranged results
storage.mem_get(selector) returns the full ranked result list as a Python list. Two helpers shape the output:
| Call | Use |
|---|---|
mem_get(nns(...)) | Get every match. len(...) is the count. |
mem_get(range_sel(nns(...), start, limit)) | Materialize a window — useful for top-N. |
range_sel(inner, start, limit) consumes its inner selector, so to demonstrate both count and top 10 the example builds the NNS selector twice (cheap; the substrate work dominates).
See Working with Results for more on shaping selector output. When you need per-result SelectorExtra (e.g. NNS scores) or lazy iteration, reach for memory.lazy_selector_iter(view, selector) — mem_get returns Chunks only.
Running
pip install kongming-rs-hv
python examples/word_indexer/word_indexer.py
Expected output shape:
Ingested 4982 words in 8.4s.
by word 'the': 1 match(es) [0.3 ms]
1. the
by word 'people': 1 match(es) [0.3 ms]
1. people
****er (6 letters): N match(es) [~190 ms]
1. <some six-letter -er word>
...
*******tion (11 letters): M match(es) [~480 ms]
1. <some eleven-letter -tion word>
...
Approximate timings on an Apple Silicon laptop with the InMemory backend:
| Operation | Time |
|---|---|
| Ingest ~5,000 words via producer API | ~9 s |
Exact lookup via by_item_key | <1 ms |
Multi-attractor NNS (2 attractors, e.g. *****er) | ~200 ms |
Multi-attractor NNS (4 attractors, e.g. *******tion) | ~460 ms |
A note on semantic_indexing
For NNS by composite structure (i.e. “find Sequences whose member at
position N matches X”), each word’s producer is constructed with
semantic_indexing=True. This impresses the Sequence’s code into the
associative index alongside the chunk’s id-Sparkle (which is always
indexed). Without the flag, only the id is indexed and
sequence_attractor queries return zero hits.
The letter terminals are written without the flag because their code is the id-Sparkle, so id-only indexing is sufficient.
Switching to persistent storage
InMemory is fine for a demo. For a persistent store, swap one line:
storage = memory.Embedded(MODEL, "/path/to/db")
Everything else is identical.
Data attribution
Word-frequency data in top5000.txt is sourced from www.wordfrequency.info (top-5000 English words). Please credit the source when reusing this data.
Format (tab-separated, no header):
Rank Word POS Frequency Dispersion
See Also
- Sparkle ✨ — per-symbol code used for letters
- Sequence 📿 — ordered composite used for words
- Attractors —
sequence_attractorand friends - Near-Neighbor Search —
nnsover multiple attractors - Substrate & Views — batched mutable views
- Working with Results —
range_sel,mem_get
LISP Interpreter
A LISP interpreter where every data structure — atoms, cons cells, lists, closures — is encoded as a hypervector. No traditional memory allocation, no pointers, no garbage collector. All computation happens through hypervector algebra.
Two Implementations
The LISP interpreter ships in two forms, both feature-identical:
Pure Python (pylisp) | Rust (kongming_rs.lisp) | |
|---|---|---|
| Source | Open-sourced in examples/pylisp/ | Compiled into kongming-rs-hv |
| Readable | Yes — ~500 lines of annotated Python | No — compiled Rust binary |
| Performance | Slower (Python overhead per operation) | Faster (native code) |
| Import | from pylisp import LispEnv | from kongming.lisp import LispEnv |
| Dependencies | kongming-rs-hv (for hypervector primitives) | Included in kongming-rs-hv |
| Use case | Learning, debugging, extending | Production, notebooks |
Rust (built-in)
The kongming-rs-hv package includes a Rust-based LISP interpreter built directly on the internal Rust API and primitives. This implementation is compiled into the Python wheel and accessible via from kongming.lisp import LispEnv.
Since it operates on Rust-native hypervector types with zero Python overhead, it delivers the best performance for production use.
Python (open-source)
For research and study, we provide a pure-Python implementation of the same interpreter, built entirely on the public Python API of kongming-rs-hv. It mirrors the Rust implementation’s architecture but uses Python-level operations (hv.bind, hv.bundle, hv.release, etc.), making the underlying hypervector mechanics fully transparent and easy to modify.
This implementation is ideal for:
- Understanding how LISP primitives map to hypervector operations
- Experimenting with alternative encodings or evaluation strategies
- Teaching and prototyping
Quick Start
pip install kongming-rs-hv
# Pure Python
from pylisp import LispEnv
env = LispEnv()
env.eval("(CAR (QUOTE (A B C)))") # => "A"
env.eval("(CDR (QUOTE (A B C)))") # => "(B C)"
env.eval("(CONS (QUOTE A) (QUOTE B))") # => "(A . B)"
# Rust (same API, same results)
from kongming.lisp import LispEnv
env = LispEnv()
env.eval("(CAR (QUOTE (A B C)))") # => "A"
Supported Forms
McCarthy’s 7 Primitives (1960)
| Form | Example | Result |
|---|---|---|
QUOTE | (QUOTE (A B C)) | (A B C) |
CAR | (CAR (QUOTE (A B C))) | A |
CDR | (CDR (QUOTE (A B C))) | (B C) |
CONS | (CONS (QUOTE A) (QUOTE B)) | (A . B) |
ATOM | (ATOM (QUOTE A)) | T |
EQ | (EQ (QUOTE A) (QUOTE A)) | T |
COND | (COND ((EQ (QUOTE A) (QUOTE B)) (QUOTE NO)) (T (QUOTE YES))) | YES |
Extensions
| Form | Description |
|---|---|
LAMBDA | Anonymous functions with curried beta-reduction and variable shadowing |
LABEL | Recursive self-reference (enables recursion without mutation) |
DEFINE | Bind a name to a function in the environment |
Examples
# Lambda
env.eval("((LAMBDA (X) (CAR X)) (QUOTE (A B C)))") # => "A"
# Define a reusable function
env.eval("(DEFINE SECOND (LAMBDA (L) (CAR (CDR L))))")
env.eval("(SECOND (QUOTE (X Y Z)))") # => "Y"
# Recursion with LABEL
env.eval(
"(DEFINE LAST (LAMBDA (L) "
" ((LABEL REC (LAMBDA (X) "
" (COND ((ATOM (CDR X)) (CAR X)) "
" (T (REC (CDR X)))))) L)))"
)
env.eval_full("(LAST (QUOTE (A B C)))") # => "C"
How It Works
Each LISP value is a Sparkle — a sparse binary hypervector seeded by
its content. Atoms like A, B, CAR are sparkles in a symbol domain.
A cons cell (a . b) is encoded as:
cell = bundle(bind(a, LHS), bind(b, RHS))
where LHS and RHS are fixed tag sparkles. The cell is stored under a
fresh random sparkle id. To extract:
car(id) = cleanup(release(cell, LHS))
cdr(id) = cleanup(release(cell, RHS))
The release operation is noisy — it produces an approximate result.
Cleanup uses near-neighbor search (NNS) over the substrate’s
associative index to find the exact stored sparkle that best matches
the noisy probe.
File Structure
examples/pylisp/
__init__.py # Package entry point
types.py # HyperBinary type alias
env.py # LispEnv: domains, symbols, lexicon, substrate
cons.py # Cons cells: cons, car, cdr, cleanup via NNS
reader.py # S-expression tokenizer and parser
evaluator.py # Single-step and fixed-point evaluator
lambda_.py # Beta-reduction with currying and shadowing
printer.py # Hypervector → S-expression display
test_pylisp.py # 16 tests mirroring the Rust integration suite
Storage Backends
# In-memory (default, volatile)
env = LispEnv()
# Persistent (Embedded disk-backed)
env = LispEnv(path="/tmp/my_lisp_db")
Running Tests
pip install pytest kongming-rs-hv
pytest examples/pylisp/test_pylisp.py -v
Notebook
We provide a Colab notebook that runs both implementations side by side, demonstrating correctness parity and performance comparison:
References
- Hey, Pentti, We Did it!: A fully Vector-Symbolic Lisp — the paper that inspired this implementation
- Peter Norvig’s (How to Write a (Lisp) Interpreter (in Python)) — the original Lispy that the Python implementation builds upon