Kongming HV

Kongming is a hyperdimensional computing library implementing sparse binary hypervectors for cognitive computing applications.

The core engine is implemented in Rust for maximum efficiency, while ergonomic APIs are open-sourced in Python for better usability.

See Hypervectors for an introduction to hyperdimensional computing and the sparse binary representation.

License

The Python source code, examples, and documentation in this repository are licensed under the MIT License.

The compiled engine distributed via PyPI (kongming-rs-hv) is proprietary.

Install

pip install kongming-rs-hv

See Installation for supported platforms and verification steps.

Published notebooks

See Notebook Platforms for all available notebooks and platform details.

Guides

Guide	Description
Python Quick Start	Installation, examples, and walkthrough
Notebook Quick Start	Platform setup, interactive notebooks, cell-by-cell walkthrough

Language Support

This documentation covers code snippets in multiple languages (if available) side by side.

Python: bindings to the underlying Rust implementation (public kongming-rs-hv on PyPI);
Go: canonical / reference implementation in proprietary package;
Rust: parallel implementation, carefully maintained in feature parity;

The documentation on yangzh.github.io/hv is deployed from release tags (v*) and stays in lockstep with the latest kongming-rs-hv release on PyPI. Whatever you read there matches what pip install kongming-rs-hv gives you.

The main branch of this repository is the working head — it may describe APIs or examples that haven’t been released yet. If you browse the raw markdown on GitHub, expect it to occasionally be ahead of the published site.

Reference

The work was initially outlined in this arxiv paper, built on top of the work from many others, and here is the citation:

Yang, Zhonghao (2023). Cognitive modeling and learning with sparse binary hypervectors. arXiv:2310.18316v1 [cs.AI]

Feedback

Found a bug, have a question, or want to suggest an improvement? Open an issue on GitHub.

Hypervectors

What is Hyperdimensional Computing?

Hyperdimensional computing (HDC) represents concepts as high-dimensional vectors and manipulates them with simple algebraic operations, typically the dimension (of any vectors) can be as high as thousands.

The key insight is that random vectors in high-dimensional spaces are nearly orthogonal — giving each concept a unique, distributed, and robust representation that tolerates potential ambiguity and interference.

In that sense, the traditional notion of curse of dimensionality becomes the bless of dimensionality.

Motivated readers should perform their own background research on this topic.

Sparse Binary Representation

Kongming uses sparse binary hypervectors. Each vector has a fixed, large number of dimensions (e.g., 65,536/64K or 1,048,576/1M), but only a very small fraction of them are “on” (set to 1). This sparsity is controlled by the Model configuration.

Furthermore, we focus on a special sparse binary configuration: SparseSegmented where each vector is divided into equal-sized segments, and exactly one bit is ON per segment.

Conceptually you can imagine each SparseSegmented hypervector as a list of phasers, where the offset of ON bit (within a host segment) represents the discretized phase.

In general, this unique constraint enables:

Compact storage: only the offset of ON bit within its host segment need to be stored
Efficient operations: Unlike neural nets, where weights are recorded in float numbers, binary operations can be stored and manipulated very efficiently with modern memory / CPUs.

Identity and Inverses

The identity vector has all offsets set to 0. Binding with identity is a no-op. Actually as a special case, there is no storage cost;
Binding a vector with its inverse yields the identity.

Similarity and distance measure

Two vectors are compared via overlap — the count of segments where both have the same ON bit. This is equivalent to a bitwise AND operation, which can be performed very efficiently in modern CPU.

For a model with cardinality $k$ and segment size $s$ , the expected overlap between two random vectors $A$ and $B$ is:

$E [O (A, B)] = N s = 1$

Given the model setup, this is typically 0, 1 or 2.

Semantically-related vectors have significantly higher overlap. A vector’s overlap with itself equals its cardinality $M$ .

The commonly-used distance measure (dis-similar measure) for binary vectors is Hamming Distance, equivalent to a bitwise XOR operation. As we discussed (and proved) in the paper, the overlap and Hamming distance between sparse binary hypervectors are two sides of the same coin, with the following equation:

$2 \times O (A, B) + H (A, B) = 2 M$

Supported Models

A Model determines the total number of dimensions (width), how those dimensions are divided into segments (cardinality and sparsity), and therefore implies critical storage and compute characteristics.

Model	Width	Sparsity Bits	Segment Size	Cardinality (ON bits)
`MODEL_64K_8BIT`	65,536	8	256	256
`MODEL_1M_10BIT`	1,048,576	10	1,024	1,024
`MODEL_16M_12BIT`	16,777,216	12	4,096	4,096
`MODEL_256M_14BIT`	268,435,456	14	16,384	16,384
`MODEL_4G_16BIT`	4,294,967,296	16	65,536	65,536

Model properties

All model functions take a Model enum value and return the derived property:

Note

For simplicity, we use function names from Python. The counterparts from Go / Rust can be found by consulting their respective references.

Function	Description
`width`	Total dimension count (`2^width_bits`)
`sparsity`	Fraction of ON bits (`1 / segment_size`)
`cardinality`	Number of ON bits (= number of segments)
`segment_size`	Dimensions per segment

How to Choose a Model

MODEL_64K_8BIT: Fast prototyping, tiny memory footprint. Good for tests and small-scale experiments.
MODEL_1M_10BIT: General-purpose, balances performance and storage.
MODEL_16M_12BIT: General-purpose, for the adventurous.
MODEL_256M_14BIT / MODEL_4G_16BIT: Very high capacity, not there yet.

Larger models provide more orthogonal space (lower collision probability) at the cost of more memory per vector.

Note

The storage per hypervector estimation only applies to SparseSegmented (and a few other types) where raw offsets are needed. For certain scenarions, optimization can be employed to dramatically reduce storage requirements. Sparkle, for example, only stores the random seeds so that the offsets can be recovered on-the-fly at serialization time. Composite types (such as Set, Sequence) typically contain references to member Sparkle instances, and typically cost much less storage than a single SparseSegmented instance.

Operators

Kongming provides two core algebraic operations on hypervectors.

Bind

Binding ( $\otimes$ ) combines two vectors into a result that is dissimilar to both inputs. It is the multiplicative operation in the HDC algebra.

Mathematically

$A \otimes B = B \otimes A (commutative)$

$(A \otimes B) \otimes C = A \otimes (B \otimes C) (associative)$

$A \otimes I = A (where I is an identity vector)$

$A \otimes A^{- 1} = I (inverse)$

$O (A \otimes B, A) \approx O (A \otimes B, B) \approx noise (dissimilarity)$

Implementation: segment-wise offset addition modulo segment size: check out original paper for details.

Check out code snippets from the API reference.

Release

Occasionally we use release, which is derived from bind, as the equivalent of division, as opposed to multiplication.

$A ⊘ B = A \otimes B^{- 1}$

Note that release is anti-commutative: $(A ⊘ B)^{- 1} = B ⊘ A$

Check out code snippets from the API reference.

Bundle

Bundling ( $\oplus$ ) creates a superposition of vectors — the result is similar to all inputs. It is the additive operation within VSA algebra.

Mathematically

$S = i, \oplus \sum A_{i}$

$O (S, A_{i}) ≫ O_{random} (similarity to each member)$

$O (S, X) \approx O_{random} for X \in / {A_{i}} (dissimilarity to non-members)$

Check out original paper for details on bundle operator.

Check out code snippets from the API reference.

Composites

Composites combine multiple hypervectors into higher-level structures. Each composite type uses a different combination strategy, preserving different kinds of relationships between its members.

All composites follows the same contract (interface in Go and traits in Rust) and can be nested — a Set can contain Sparkles, Knots, or even other Sets.

Set

An unordered collection of concepts.

$S = S_{ma r k er} \otimes (i, \oplus \sum M_{i})$

where $S_{ma r k er}$ is a special marker to distinguish a set from its individual members.

This mark is tuned for the domain, so that it will be shared among all sets within the same domain.

Use when: you need to represent “these things together” without order.

Check out code snippets from the API reference.

Sequence

An ordered collection.

$S = S_{ma r k er} \otimes (i, \oplus \sum M_{i} \otimes S_{s t e p}^{i})$

where $S_{s t e p}$ is a generic hypervector for positional encoding.

$S_{ma r k er}$ is a special marker to distinguish a sequence from its individual members. This mark is tuned for the domain, so that it will be shared among all sequences within the same domain.

Use when: order matters (e.g., words in a sentence, events in time).

Check out code snippets from the API reference.

Octopus

A key-value structure. Each key (a string) is converted to a Sparkle and bound with its corresponding value before bundling.

$S = i, \oplus \sum K_{i} \otimes V_{i}$

Use when: you need to represent structured records with named attributes.

Check out code snippets from the API reference.

Knot

The result of binding (multiplicative composition) of hypervectors.

$S = i, \otimes \prod M_{i}$

Binding is reversible: given a Knot of A and B, you can recover A by releasing B (binding with B’s inverse).

Use when: you need a reversible association between concepts.

Check out code snippets from the API reference.

Parcel

The result of bundling (additive composition).

$S = i, \oplus \sum M_{i}$

Unlike direct bundling, Parcel keeps tracking of its members for serialization and introspection.

Use when: you need a superposition of concepts, with optional weights.

Check out code snippets from the API reference.

Pointer

A one-directional reference between two hypervectors.

$P = A \otimes B^{- 1}$

A Pointer encodes a directed link from a source A to a destination B. Given the pointer and either endpoint, the other endpoint can be recovered: P ⊗ B recovers A, and A ⊗ P^{-1} recovers B.

Pointer is the structured wrapper for the release operation — Release(A, B) returns a Pointer with A as source and B as destination.

Use when: you need a reversible directional link (e.g., representing edges, mappings, or “from→to” relations).

Check out code snippets from the API reference.

Summary

Type	Composition	Order?	Use Case
Set	Bundle + marker	No	Unordered groups
Sequence	Positional-bind + bundle + marker	Yes	Ordered lists
Octopus	Key-bind + bundle	Partial (by key)	Key-value records
Knot	Bind (multiply)	No	Reversible associations
Parcel	Bundle (add)	No	superpositions, weighted or unweighted
Pointer	Bind A with Inv(B)	Directional	One-directional references

Near Neighbor Search

Near Neighbor Search (NNS) generally retrieves chunks from the storage substrate in the increasing order of Hamming distance (from a query).

As we mentioned earlier, this is equivalent to a strictly decreasing order of overlap (between query and candidate). If overlap encodes the semantic relevance, this translates to a list of semantically similar candidates.

It leverages an underlying Associative Index for efficient recovery of candidates. The Associative Index is a semantic index that enables fast similarity-based lookup over stored hypervectors. Conceptually it turns a key-value substrate (item memory) into an associative memory — one where retrieval is by content similarity, not by exact content or key match.

This NNS module has a constant time complexity, with help from associative index. This implies the query time remain bounded, independent of the number of entries in the storage system. The secret sauce is the efficient random-access to underlying associative index.

Unlike approximate nearest neighbor methods (LSH, HNSW, etc.), the NNS module can computes exact overlap counts via the associative index. There is no approximation error and no index-specific parameters to tune.

Jump to the API reference for Near-Neighbor Search.

HV

The core hypervector API. This module provides the building blocks for hyperdimensional computing: vector types, algebraic operators, and model configuration.

Section	Description
Common Utilities	Model, SparseOperation, similarity, identity, hashing
Operators	Bind, bundle, and BindDirect
HyperBinary Types	Interface + concrete types (Sparkle, Set, Sequence, etc.)
Customizing Run-time Behavior	Environment variables
Misc	Display, serialization

Common Utilities

Functions and types used across all HyperBinary types.

Section	Description
Models	Model enum and model functions
SparseOperation	Model + seeded RNG for deterministic vector generation
Seed128	128-bit seed embedding Domain + Pod
Domain & Pod	Semantic grouping (Domain) and slot identifier (Pod)
Utilities	Similarity, identity check, hashing

Models

See Concepts: Hypervectors for the full overview.

Model Enum

model0 = hv.MODEL_64K_8BIT

model1 = hv.MODEL_1M_10BIT

model0 := api.Model_MODEL_64K_8BIT

model1 := api.Model_MODEL_1M_10BIT

#![allow(unused)]
fn main() {
let model0 = Model::Model64k8bit;

let model1 = Model::Model1m10bit;
}

Model Functions

hv.width(hv.MODEL_1M_10BIT)           # total dimensions
hv.cardinality(hv.MODEL_1M_10BIT)     # ON bit count
hv.sparsity(hv.MODEL_1M_10BIT)        # sparsity
hv.segment_size(hv.MODEL_1M_10BIT)    # dimensions per segment

hv.Width(hv.MODEL_1M_10BIT)           // total dimensions
hv.Cardinality(hv.MODEL_1M_10BIT)     // ON bit count
hv.Sparsity(hv.MODEL_1M_10BIT)        // sparsity
hv.SegmentSize(hv.MODEL_1M_10BIT)     // dimensions per segment

#![allow(unused)]
fn main() {
let model = Model::Model1m10bit;
model.width()             // total dimensions
model.cardinality()       // ON bit count
model.sparsity()          // sparsity
model.segment_size()      // dimensions per segment
}

See also: SparseOperation — Model + seeded RNG for deterministic vector generation.

Domain & Pod

A Domain models the semantic grouping for hypervectors, providing the high 64-bit half of a Seed128. A Pod is a slot within a Domain, providing the low 64-bit half. The (Domain, Pod) pair uniquely identifies a Sparkle.

Domain Constructors

# From a name string (hashed to a 64-bit id)
d = hv.Domain("animals")

# Same as above
d = hv.Domain.from_name("animals")

# From a raw 64-bit id
d = hv.Domain.from_id(0x1234567890abcdef)

# From a domain prefix enum and a name suffix
# The id is computed as xxhash(prefix_label + "." + name)
d = hv.Domain.from_prefix_and_name(hv.DOMAIN_PREFIX_NLP, "concept")

# Accessors
d.id()              # u64
d.name()            # str (empty if constructed from id)
d.domain_prefix()   # int (0 = UNKNOWN if no prefix was set)
d.is_default()      # True if id == 0

// Polymorphic — dispatches by argument type:
d := hv.NewDomain("animals")               // string → name-based
d := hv.NewDomain(uint64(0x1234567890abcdef)) // uint64 → id-based
d := hv.NewDomain(api.DomainPrefix_NLP)    // enum → prefix-based (no name)

// Explicit forms (equivalent, zero overhead, compile-time safe):
d := hv.NewDomainFromName("animals")
d := hv.NewDomainFromID(0x1234567890abcdef)
d := hv.NewDomainFromPrefix(api.DomainPrefix_NLP, "concept")

#![allow(unused)]
fn main() {
let d = Domain::from_name("animals");
let d = Domain::from_id(0x1234567890abcdef);
let d = Domain::from_prefix(DomainPrefix::Nlp, "concept");
}

Domain Prefix Constants

Constant	Label
`hv.DOMAIN_PREFIX_USER`	🎭
`hv.DOMAIN_PREFIX_NLP`	💬

Domain prefixes provide namespacing for domains. When a prefix is set, the domain id is derived from the prefix label (and optional name), ensuring consistent hashing across languages.

Pod Constructors

Pods can be seeded by a string word, a raw uint64, or a prewired enum value.

# From a word string (hashed to a 64-bit seed)
p = hv.Pod("cat")

# Same as above
p = hv.Pod.from_word("cat")

# From a raw 64-bit seed
p = hv.Pod.from_seed(42)

# From a prewired enum value
p = hv.Pod.from_prewired(hv.PREWIRED_SET_MARKER)
p = hv.Pod.from_prewired(hv.PREWIRED_STEP)

# Accessors
p.seed()       # u64
p.word()       # str (empty if constructed from seed or prewired)
p.prewired()   # int (0 if not prewired)
p.is_default() # True if seed == 0

// Polymorphic — dispatches by argument type:
p := hv.NewPod("cat")                   // string → word pod
p := hv.NewPod(uint64(42))              // uint64 → seed pod
p := hv.NewPod(api.Prewired_SET_MARKER) // enum → prewired pod

// Explicit forms (equivalent, zero overhead, compile-time safe):
p := hv.NewPodFromWord("cat")
p := hv.NewPodFromSeed(42)
p := hv.NewPodFromPrewired(api.Prewired_SET_MARKER)

#![allow(unused)]
fn main() {
let p = Pod::from_word("cat");
let p = Pod::from_seed(42);
let p = Pod::from_prewired(Prewired::SetMarker);
}

Prewired Constants

Prewired pods are infrastructure-level constants with fixed seeds:

Constant	Label
`hv.PREWIRED_NIL`	∅
`hv.PREWIRED_FALSE`	❎
`hv.PREWIRED_TRUE`	✅
`hv.PREWIRED_BEGIN`	🚀
`hv.PREWIRED_END`	🏁
`hv.PREWIRED_LEFT`	⬅️
`hv.PREWIRED_RIGHT`	➡️
`hv.PREWIRED_UP`	⬆️
`hv.PREWIRED_DOWN`	⬇️
`hv.PREWIRED_MIDDLE`	⏺️
`hv.PREWIRED_STEP`	𓊍
`hv.PREWIRED_SET_MARKER`	🫧
`hv.PREWIRED_SEQUENCE_MARKER`	📿

Polymorphic arguments (Python-only)

Most Python factories that take a Domain or Pod accept the underlying primitives directly — you rarely need to wrap them explicitly:

Parameter type	Accepted Python forms
`Domain`	`Domain` instance, `str`, `int`, `(DomainPrefix, str)` tuple
`Pod`	`Pod` instance, `Prewired` enum, `str`, `int`

# Domain — four equivalent forms in any factory expecting a Domain:
memory.by_item_key("animals", "cat")
memory.by_item_key(hv.Domain.from_name("animals"), "cat")
memory.by_item_key(0x1234, "cat")                           # from numeric id
memory.by_item_key((hv.DOMAIN_PREFIX_NLP, "concept"), "p")  # from (prefix, name)

# Pod — Prewired enum is recognized:
memory.new_terminal("internal", hv.PREWIRED_STEP)           # Pod from Prewired
memory.new_terminal("animals", "cat")                       # Pod from word
memory.new_terminal("animals", 0xCAFE_BABE)                 # Pod from raw seed

For the parallel polymorphism on Seed128, see Seed128 → Polymorphic arguments.

Seed128

A Seed128 is a 128-bit seed to drive a random number generator.

The current random number generator expects 2 64-bit seeds: the same (seed_high, seed_low) pair always produces the same sequence of random numbers, enabling reproducible and deterministic vector generation across runs and languages.

Constructors

# From Domain and Pod arguments (each accepts Domain/Pod, int, or str)
seed = hv.Seed128("animals", "cat")                # domain name + pod word
seed = hv.Seed128(0, 42)                           # default domain + raw pod seed
seed = hv.Seed128("animals", 42)                   # domain name + raw pod seed
seed = hv.Seed128(hv.Domain("animals"), hv.Pod("cat"))  # explicit Domain/Pod objects

# Zero seed
seed_zero = hv.Seed128.zero()                      # (0, 0)

# Random seed from a SparseOperation
seed_rand = hv.Seed128.random(so)                  # consumes two u64 from the RNG

# Accessors
seed.domain()                                      # Domain object
seed.pod()                                         # Pod object
seed.high()                                        # u64 (domain id)
seed.low()                                         # u64 (pod seed)

seed := hv.NewSeed128(0, 42)          // from raw uint64 values
seedZero := hv.Seed128Zero()          // zero seed
seed1 := hv.NewSeed128FromDP(hv.NewDomain("domain"), hv.NewPod("pod"))

#![allow(unused)]
fn main() {
let seed = Seed128::new(0, 42);     // from raw u64 values
let seedZero = Seed128::zero();         // zero seed
}

Usage

All composite constructors take a Seed128, as seed for the bundle operator:

seed = hv.Seed128("fruits", "fruit_set")

s = hv.Set(seed, a, b, c)
seq = hv.Sequence(seed, a, b, c)

seed := hv.NewSeed128(0, 42)

s := hv.NewSet(seed, a, b, c)
seq := hv.NewSequence(seed, 0, a, b, c)

#![allow(unused)]
fn main() {
let seed = Seed128::new(0, 42);

let s = Set::new(seed, vec![a, b, c]);
let seq = Sequence::new(seed, 0, vec![a, b, c]);
}

Polymorphic arguments (Python-only)

Anywhere a Python factory expects a Seed128 (composite constructors like hv.Set / hv.Sequence / hv.Octopus, the hv.bundle operator, etc.) you can pass either a Seed128 instance or a (domain, pod) tuple — the binding extracts and constructs the seed for you.

Parameter type	Accepted Python forms
`Seed128`	`Seed128` instance, or a `(domain, pod)` tuple

The tuple composes with the polymorphic forms accepted by Domain and Pod (see Domain & Pod → Polymorphic arguments), so each side can itself be a string / int / Prewired enum / (prefix, name) tuple — letting you skip the hv.Seed128(...) wrap entirely:

# Equivalent to hv.Sequence(hv.Seed128("words", "hi"), m1, m2):
seq = hv.Sequence(("words", "hi"), m1, m2)

# Tuple form composes with Domain's (DomainPrefix, str) tuple:
seq = hv.Sequence(((hv.DOMAIN_PREFIX_NLP, "concept"), "myseq"), m1, m2)

# And with Pod's Prewired enum:
seq = hv.Sequence(("internal", hv.PREWIRED_STEP), m1, m2)

SparseOperation

A SparseOperation instance wraps a Model, a random number generator, and potentially other information related to the sparse operation in general.

Constructor

so = hv.SparseOperation(hv.MODEL_1M_10BIT, 0, 42)
so1 = hv.SparseOperation(hv.MODEL_1M_10BIT, "domain", "pod")

so := hv.NewSparseOperation(api.Model_MODEL_1M_10BIT, 0, 42)

#![allow(unused)]
fn main() {
let mut so = SparseOp::new(Model::Model1m10bit, 0, 42);
}

Methods

so.model()        # Model enum

so.width()        # width for this model

so.cardinality()  # cardinality for this model

so.sparsity()     # sparsity for this model

so.uint64()       # next random number

so.Model()        // api.Model

so.Width()        // width for this model

so.Cardinality()  // cardinality for this model

so.Sparsity()     // sparsity for this model

so.Uint64()       // next random number

#![allow(unused)]
fn main() {
so.model()        // Model

so.width()        // width for this model

so.cardinality()  // cardinality for this model

so.sparsity()     // sparsity for this model

so.uint64()       // next random number
}

Usage: Generating Random Vectors

so = hv.SparseOperation(hv.MODEL_1M_10BIT, 0, 42)
sparkle = hv.Sparkle.random(hv.Domain("domain"), so)

so := hv.NewSparseOperation(api.Model_MODEL_1M_10BIT, 0, 42)
sparkle := hv.NewRandomSparkle(domain, so)

#![allow(unused)]
fn main() {
let mut so = SparseOp::new(Model::Model1m10bit, 0, 42);
let sparkle = Sparkle::random(&domain, &mut so);
}

Utilities

Similarity

hv.overlap(a, b)    # Overlap

hv.hamming(a, b)    # Hamming distance

hv.equal(a, b)      # Equality check

hv.Overlap(a, b)         // Overlap

hv.Hamming(a, b)         // Hamming distance

hv.Equal(a, b)           // Equality check

#![allow(unused)]
fn main() {
overlap(&a.core(), &b.core())   // Overlap

hamming(&a.core(), &b.core())   // Hamming distance

hyper_binary::equal(&a, &b)     // bool
}

Identity Check

v=hv.Sparkle.identity(model)

hv.is_identity(v)   # True if v is an identity vector

v := hv.NewSparkleIdentity(model)

hv.IsIdentity(v)    // True if v is an identity vector

#![allow(unused)]
fn main() {
v.is_identity()     // True if v is an identity vector
}

Hash Utilities

hv.hash64_from_string("hello")   # deterministic u64 hash from string
hv.hash64_from_bytes(b"\x01\x02") # deterministic u64 hash from bytes
hv.curr_time_as_seed()            # current time as a u64 seed
hv.kongming_studio_seed()         # fixed studio seed constant

hv.Hash64FromString("hello")     // uint64
hv.Hash64FromBytes(raw)          // uint64
hv.Hash64FromMessage(protoMsg)   // uint64 — hash from protobuf message
hv.CurrTimeAsSeed()              // uint64

#![allow(unused)]
fn main() {
hash64_from_string("hello")      // u64
hash64_from_bytes(&raw)          // u64
curr_time_as_seed()              // u64
KONGMING_STUDIO_SEED             // u64
}

HyperBinary Types

All vector types conform to a common interface. In Go this is the HyperBinary interface; in Rust it is the HyperBinary trait. The two implementations are kept at feature parity.

Python doesn’t have the concept of interface/trait, but all HyperBinary derived types share a common set of methods.

v.model()        # Model enum
v.width()
v.cardinality()
v.hint()
v.stable_hash()  # int
v.seed128()
v.exponent()

v.core()         # SparseSegmented
v.power(p)       # HyperBinary

type HyperBinary interface {
    Model() api.Model
    Width() uint32
    Cardinality() uint32
    Hint() api.HyperBinaryProto_Hint
    StableHash() uint64
    Seed128() Seed128
    Exponent() int32

    Core() SparseSegmented
    Power(p int32) HyperBinary
}

#![allow(unused)]
fn main() {
pub trait HyperBinary: std::fmt::Display {
    fn model(&self) -> Model;
    fn width(&self) -> u32;
    fn cardinality(&self) -> u32;
    fn hint(&self) -> HyperBinaryHint;
    fn stable_hash(&self) -> u64;
    fn seed128(&self) -> &Seed128;
    fn exponent(&self) -> i32;

    fn core(&self) -> SparseSegmented;
    fn power(&self, p: i32) -> HyperBinaryKind;
}
}

In Rust, concrete types are wrapped in HyperBinaryKind (an enum) for dynamic dispatch instead of Go’s interface boxing.

Concrete Types

Type	Description
SparseSegmented 🍡	Foundational vector — packed per-segment offsets
Sparkle ✨	Seeded, deterministic hypervector
Learner 💫	Online Hebbian learning
Set 🫧	Unordered collection
Sequence 📿	Ordered collection with positional encoding
Octopus 🐙	Key-value composite
Knot 🪢	Bound (multiplied) group
Parcel 🎁	Bundled (added) group

SparseSegmented 🍡

The most foundational vector type — a sparse binary hypervector where each segment has exactly one ON bit at the recorded offset location. All other types (Sparkle, Set, Sequence, etc.) ultimately contain a SparseSegmented in memory for processing.

Structure

Field	Description
`model`	Sparsity configuration (Model)
`offsets`	Packed bit array of per-segment ON offsets. `nil`/`None` = identity vector
`hash`	Lazy-computed stable hash for equality checks

The offsets are bit-packed according to the model’s sparsity bits — they do not align to byte boundaries. This trades a small CPU cost for compact, uniform storage that works both in memory and on disk.

Identity vector: when offsets is blank, the vector is the identity vector where all offsets are 0. Binding with identity is a no-op, and identity requires zero storage for offsets.

Constructors

# Identity
ss = hv.SparseSegmented.identity(model)

# From per-segment offsets, typically discouraged...
ss = hv.SparseSegmented.from_offsets(model, [off0, off1, ...])

// Identity
ss := hv.NewSparseSegmentedIdentity(model)

// With explicit domain/pod and packed offsets (takes ownership of slice)
ss := hv.NewSparseSegmented(model, domain, pod, offsets)

#![allow(unused)]
fn main() {
// Identity
let ss = SparseSegmented::identity(model);

// With explicit domain/pod and packed offsets (takes ownership)
let ss = SparseSegmented::new(model, domain, pod, Some(offsets));
}

Key Methods

ss.is_identity()  # True if identity vector

ss2 = ss.power(2)
inv = ss.power(-1)

# Similarity
hv.overlap(a, b)   # Count of matching ON bits
hv.hamming(a, b)   # Count of differing segments

ss.offsets()   # returns all offsets

ss.IsIdentity()   // bool

ss2 := ss.Power(2).(hv.SparseSegmented)
inv := ss.Power(-1).(hv.SparseSegmented)

// Similarity
a.Overlap(b)   // uint32
a.Hamming(b)   // uint32

// offset access.
ss.On(idx)         // Check if a specific global index is ON
ss.Offset(seg)     // Get the offset within a segment

// Iterate over all (segment, offset) pairs
for seg, offset := range ss.OffsetIter() {
    // ...
}

#![allow(unused)]
fn main() {
ss.is_identity()  // bool

let ss2 = ss.power(2);
let inv = ss.power(-1);

// Similarity
a.overlap(&b)   // u32
a.hamming(&b)   // u32

// offset access.
ss.on(idx)          // bool
ss.offset(seg)      // u32

for (seg, offset) in ss.offset_iter() {
    // ...
}
}

Serialization

SparseSegmented serializes to HyperBinaryProto with hint SPARSE_SEGMENTED. The offsets field carries the raw packed bytes. Identity vectors serialize with empty offsets.

Sparkle ✨

Sparkles are the atomic building block for higher-level constructs: essentially SparseSegmented annotated with domain and pod.

Domain is a logical namespace that groups related Sparkle instances. Pod acts as the secondary identifier for a Sparkle instance.

Sparkle is deterministic: the same (domain, pod) pair always produces the same offsets. For this reason, the (model, pod) pair uniquely identifies a Sparkle.

Sparkle Constructors

# From a word string
s0 = hv.Sparkle.from_word(model, "animals", "cat")

# From a numeric seed
s1 = hv.Sparkle.from_seed(model, "animals", 42)

# From a prewired enum
s2 = hv.Sparkle.from_prewired(model, "animals", hv.PREWIRED_SET_MARKER)

# Identity vector
s3 = hv.Sparkle.identity(model)

# Random (from SparseOperation)
so=hv.SparseOperation(hv.MODEL_1M_10BIT, "domain", "pod")
s4 = hv.Sparkle.random("animals", so)

// From a word string
s0 := hv.NewSparkleFromWord(model, domain, "cat")

// From a numeric seed
s1 := hv.NewSparkleFromSeed(model, domain, 42)

// From a prewired enum
s2 := hv.NewSparkleFromPrewired(model, domain, api.Prewired_SET_MARKER)

// Identity vector
s3 := hv.NewSparkleIdentity(model)

// Random (from SparseOperation)
s4 := hv.NewRandomSparkle(domain, so)

// From domain + pod directly — primary constructor
s5 := hv.NewSparkle(model, domain, pod)

#![allow(unused)]
fn main() {
// From a word string
let s0 = Sparkle::from_word(model, domain, "cat");

// From a numeric seed
let s1 = Sparkle::from_seed(model, domain, 42);

// From a prewired enum
let s2 = Sparkle::from_prewired(model, domain, Prewired::SetMarker);

// Identity vector
let s3 = Sparkle::identity(model);

// Random (from SparseOperation)
let s4 = Sparkle::from_random(domain, &mut so);

// From domain + pod directly — primary constructor
let s5 = Sparkle::new(model, domain, pod);
}

Key Methods

s0.model()         # Model enum
s0.stable_hash()   # Deterministic hash
s0.exponent()      # Current exponent (1 for base vector)

s0_square=s0.power(2)     # Returns p-th power (new Sparkle)
hv.equal(s0, s0_square)   # s0_square = s0^2, different from original s0.
       
core0=s0.core()     # Returns underlying SparseSegmented
core0.offsets()    # The raw offsets for each segment.

s0.Model()         // api.Model
s0.StableHash()    // uint64
s0.Exponent()      // int32

s0Square := s0.Power(2)  // HyperBinary (cast to Sparkle)
s0Square.Core()          // SparseSegmented

#![allow(unused)]
fn main() {
s0.model()         // Model
s0.stable_hash()   // u64
s0.exponent()      // i32

let s0_square = s0.power(2)        // Sparkle
s0.core()          // SparseSegmented
}

Note

power(0) always returns the identity sparkle. power(-1) returns the inverse.

Pretty-printing

# Pretty-printing, or s.__str__()
print(s0)
# ✨:🔗animals,🌱cat

# More detailed information, or s.__repr__()
s
# hint: SPARKLE
# model: MODEL_1M_10BIT
# stable_hash: 9725717137035622833
# domain:
#   name: animals
# pod:
#   word: cat

During pretty-printing of Sparkle instances, you may notice special emoji for domain / pods.

emojis for domain / pod

Emoji	Variant	Example
🔗	named domain	`🔗animals`, `🔗PREFIX.name`
🌐	numeric domain	`🌐0x..c862`
🌱	named pod	`🌱cat`
🫛	numeric pod	`🫛0x..80e4`
🍀	pre-defined pod	`🍀SET_MARKER`
💪	Exponent / Power	`💪3`, `💪-1`

Identity vectors display as IDENT (e.g., ✨IDENT).

Note

The underlying offsets are lazily generated from a seeded PRNG. Only the seeds are stored in serialization, which is a significant storage saving; offsets are recomputed during de-serialization.

Learner 💫

Learners are designed to perform online bundling for a stream of observations, in the form of Hebbian-style learning.

The total storage / processing budget is fixed — what matters is the distribution of weights among observed vectors.

Constructors

learner = hv.Learner(model, hv.Seed128(0, 42))

# a randomly-initialized learner.
learner = hv.Learner.random(so)

learner := hv.NewLearner(model, hv.NewSeed128(0, 42), nil)

// a randomly-initialized learner.
learner := hv.NewRandomLearner(so)

#![allow(unused)]
fn main() {
let mut learner = Learner::new(model, Seed128::new(0, 42), None);

// a randomly-initialized learner.
let mut learner = Learner::random(&mut so);
}

Feeding Observations

learner.bundle(a)                 # single observation

learner.bundle_multiple(b, 3)     # with weight multiplier

learner.Bundle(a)                 // single observation

learner.BundleMultiple(b, 3)      // with weight multiplier

#![allow(unused)]
fn main() {
learner.bundle(&a)?;              // single observation

learner.bundle_multiple(&b, 3)?;  // with weight multiplier
}

Inspection

learner.age()             # number of observations seen

learner.affinity(a)   # raw overlap; returns RandomOverlap when age==0
learner.weight(a)     # implicit weight for a probe vector; 0.0 when age==0

learner.Age()             // uint32

learner.Affinity(a)   // uint32; returns RandomOverlap when age==0
learner.Weight(a)     // float64; 0.0 when age==0

#![allow(unused)]
fn main() {
learner.age()             // u32

learner.affinity(&a)  // u32; returns RandomOverlap when age==0
learner.weight(&a)    // f64; 0.0 when age==0
}

Untrained learner is neutral

A fresh Learner (age == 0) has no offsets to overlap against. Affinity short-circuits to a non-zero baseline so that Weight yields exactly 0.0 for any probe — neutral, non-selecting, also non-rejecting.

Set 🫧

An unordered collection of hypervectors. See Composites: Set for the conceptual overview.

Constructor

s = hv.Set(hv.Seed128(0, 42), first, second, third)

s := hv.NewSet(hv.NewSeed128(0, 42), first, second, third)

#![allow(unused)]
fn main() {
let s = Set::new(Seed128::new(0, 42), members);
}

Notable methods


# All these will be approximately 1/3 of the total cardinality.
hv.overlap(s.unmasked(), first)
hv.overlap(s.unmasked(), second)
hv.overlap(s.unmasked(), third)

Sequence 📿

An ordered collection of hypervectors with positional encoding. See Composites: Sequence for the conceptual overview.

Constructor

# Constructing a sequence, with logical index start at 1 (default to 0).
seq = hv.Sequence(hv.Seed128(0, 42), first, second, third, start=1)

seq := hv.NewSequence(hv.NewSeed128(0, 42), 1, first, second, third)

#![allow(unused)]
fn main() {
let seq = Sequence::new(Seed128::new(0, 42), 1, members);
}

In-place edits: Append / Prepend / Reset

Append, Prepend, and Reset all mutate the Sequence in place — clone first if you need to preserve the original.

Append(more...) — add members at the end. start is unchanged.
Prepend(more...) — add members at the front; start decrements by len(more) so existing members keep their positional binding.
Reset(start) — shift the starting index. No-op when start equals the current start.

After any of these, seq equals what you’d get by building a fresh NewSequence(seed, new_start, all_members...) — the domain/pod seed is preserved.

import copy

seq = hv.Sequence(hv.Seed128(0, 42), a, b, c)

# Append / Prepend are variadic and mutate in place.
seq.append(d, e)            # seq now [a, b, c, d, e]
seq.prepend(x, y)           # seq now [x, y, a, b, c, d, e], start -= 2
seq.reset(10)               # shift the starting index to 10

# To preserve the original, clone first:
base = hv.Sequence(hv.Seed128(0, 42), a, b, c)
s1 = copy.copy(base)
s1.append(d)                # base is untouched

seq := hv.NewSequence(hv.NewSeed128(0, 42), 0, a, b, c)

seq.Append(d, e)            // seq now [a, b, c, d, e]
seq.Prepend(x, y)           // seq now [x, y, a, b, c, d, e], start -= 2
seq.Reset(10)               // shift the starting index to 10

// To preserve the original, clone first (Clone returns HyperBinary):
base := hv.NewSequence(hv.NewSeed128(0, 42), 0, a, b, c)
s1 := base.Clone().(hv.Sequence)
s1.Append(d)                // base is untouched

#![allow(unused)]
fn main() {
let mut seq = Sequence::new(Seed128::new(0, 42), 0, vec![a, b, c]);

seq.append(vec![d, e]);      // seq now [a, b, c, d, e]
seq.prepend(vec![x, y]);     // seq now [x, y, a, b, c, d, e], start -= 2
seq.reset(10);               // shift the starting index to 10

// To preserve the original, clone first:
let base = Sequence::new(Seed128::new(0, 42), 0, vec![a, b, c]);
let mut s1 = base.clone();
s1.append(vec![d]);          // base is untouched
}

Octopus 🐙

A key-value composite where each value is bound with its key’s Sparkle. See Composites: Octopus for the conceptual overview.

Constructor

Keys are Pods. In Python, strings (and any value polymorphically convertible to Pod) are accepted and auto-converted.

oct = hv.Octopus(hv.Seed128(0, 42), ["color", "shape"], red, circle)

keys := []hv.Pod{hv.NewPod("color"), hv.NewPod("shape")}
oct := hv.NewOctopus(hv.NewSeed128(0, 42), keys, red, circle)

#![allow(unused)]
fn main() {
let keys = vec![Pod::from_word("color"), Pod::from_word("shape")];
let oct = Octopus::new(Seed128::new(0, 42), keys, values);
}

Key Methods

oct.value_by_key("color")  # accepts Pod | str | int | Prewired

oct.ValueByKey(hv.NewPod("color"))  // HyperBinary — lookup by Pod

#![allow(unused)]
fn main() {
oct.value_by_key(&Pod::from_word("color"))  // Option<&HyperBinaryKind>
}

Knot 🪢

The result of binding (multiplicative composition) of hypervectors. Unlike BindDirect, Knot tracks its member parts for serialization and debugging. See Composites: Knot.

Constructor

# Not directly constructed in Python. Use hv.bind() instead.
k = hv.bind(a, b)

k := hv.NewKnot(hv.NewSeed128(0, 42), partA, partB)

// More commonly via the Bind operator:
k := hv.Bind(a, b)

#![allow(unused)]
fn main() {
let k = Knot::new(Seed128::new(0, 42), parts);
}

Extending a Knot

An existing Knot can be extended with additional parts via expand. This mutates the Knot in place — equivalent to re-binding all parts from scratch but without reconstructing the base.

k = hv.bind(a, b)
k.expand(c)       # k is now equivalent to hv.bind(a, b, c)

k := hv.Bind(a, b)
k.Expand(c)       // k is now equivalent to hv.Bind(a, b, c)

#![allow(unused)]
fn main() {
let mut k = operators::bind_hb(vec![a.clone(), b.clone()]);
k.expand(vec![c]);  // k is now equivalent to bind_hb(vec![a, b, c])
}

If you need to preserve the original, clone before expanding — see the Expand operator section for full examples including clone-first patterns.

Parcel 🎁

The result of bundling (additive composition) of hypervectors. Unlike BundleDirect, Parcel tracks its members and bundling seed for serialization and debugging. See Composites: Parcel.

Constructors

p = hv.bundle(hv.Seed128(10, 1), a, b, c)

p := hv.NewParcel(hv.NewSeed128(0, 42), partA, partB, partC)

#![allow(unused)]
fn main() {
let p = Parcel::new(Seed128::new(0, 42), parts);
}

Pointer 👉

A one-directional reference between two hypervectors. A Pointer encodes a directed link from a source to a destination via P = source ⊗ Inv(destination). Given the pointer and either endpoint, the other endpoint can be recovered. See Composites: Pointer.

Constructor

p = hv.Pointer(hv.Seed128(0, 42), source, destination)

# Or via the release operator:
p = hv.release(source, destination)

p := hv.NewPointer(hv.NewSeed128(0, 42), source, destination)

// Or via the Release operator:
p := hv.Release(source, destination)

#![allow(unused)]
fn main() {
let p = Pointer::new(Seed128::new(0, 42), source, destination);

// Or via the release operator:
let p = operators::release(&source, &destination);
}

Endpoints

A Pointer retains references to its source (A) and destination (B).

p.source()        # → A
p.destination()   # → B

p.Source()        // HyperBinary — A
p.Destination()   // HyperBinary — B

#![allow(unused)]
fn main() {
p.source()        // &HyperBinaryKind — A
p.destination()   // &HyperBinaryKind — B
}

Recovering endpoints

Given the pointer and one endpoint, the other can be recovered:

RDeref(B) = A — recover the source given the destination, via P ⊗ B.
Deref(A) = B — recover the destination given the source, via A ⊗ Inv(P).

p = hv.Pointer(seed, a, b)
recovered_a = p.rderef(b)   # ≈ a
recovered_b = p.deref(a)    # ≈ b

p := hv.NewPointer(seed, a, b)
recoveredA := p.RDeref(b)   // ≈ a
recoveredB := p.Deref(a)    // ≈ b

#![allow(unused)]
fn main() {
let recovered_a = p.rderef(&b);  // ≈ a
let recovered_b = p.deref(&a);   // ≈ b
}

Anti-commutativity

Pointer (and the release operator that constructs it) is anti-commutative:

$P (A, B) = P (B, A)^{- 1}$

Operators

See Concepts: Operators for the full overview.

Bind

bound = hv.bind(a, b)
released = hv.release(bound, b)  # this will recover `a`

hv.equal(a, b)                   # hash equality

bound := hv.Bind(a, b)                       
recovered := hv.Release(bound, b)        // this will recover `a`

eq := hv.Equal(a, b)                     // bool

#![allow(unused)]
fn main() {
let bound = operators::bind_hb(vec![a.clone(), b.clone()]); // Knot
let recovered = operators::release(&bound, &b);             // this will recover `a`

let eq = hyper_binary::equal(&a, &b);                       // bool
}

Release

Extracts one component from a binding: $A ⊘ B = A \otimes B^{- 1}$

release returns a Pointer — a directional reference from composite to role that retains both endpoints for inspection and serialization. The bit-level value is identical to bind(composite, inverse(role)).

bound = hv.bind(role, filler)
recovered = hv.release(bound, role)  # Pointer; ≈ filler at the bit level

bound := hv.Bind(role, filler)
recovered := hv.Release(bound, role)  // Pointer; ≈ filler at the bit level

#![allow(unused)]
fn main() {
let bound = operators::bind_hb(vec![role.clone(), filler.clone()]);
let recovered = operators::release(&bound, &role);  // Pointer
}

Expand (extend a Knot)

Extends an existing Knot with additional operands without re-binding from scratch. k.expand(c) on k = bind(a, b) gives the same result as bind(a, b, c) — but mutates k in place, so clone first if you need the original.

import copy

k = hv.bind(a, b)
k.expand(c)                 # k is now equivalent to hv.bind(a, b, c)

# To preserve the original, clone first:
base = hv.bind(a, b)
k1 = copy.copy(base)
k1.expand(c)                # base is untouched

k := hv.Bind(a, b)
k.Expand(c)                 // k is now equivalent to hv.Bind(a, b, c)

// To preserve the original, clone first (Clone returns HyperBinary):
base := hv.Bind(a, b)
k1 := base.Clone().(hv.Knot)
k1.Expand(c)                // base is untouched

#![allow(unused)]
fn main() {
let mut k = operators::bind_hb(vec![a.clone(), b.clone()]);
k.expand(vec![c.clone()]);   // k is now equivalent to bind_hb(vec![a, b, c])

// To preserve the original, clone first:
let base = operators::bind_hb(vec![a.clone(), b.clone()]);
let mut k1 = base.clone();
k1.expand(vec![c]);           // base is untouched
}

BindDirect

Like Bind, but returns a raw SparseSegmented instead of a Knot — no operand tracking. Cheaper for intermediate computations where you don’t need to reverse the bind or inspect the operand list.

# domain/pod default to the zero Domain/Pod
ss = hv.bind_direct(a, b, c)

# Or supply an explicit seed (annotates the resulting SparseSegmented):
ss = hv.bind_direct(a, b, domain=d, pod=p)

ss := hv.BindDirect(domain, pod, a, b, c)  // SparseSegmented

#![allow(unused)]
fn main() {
let ss = operators::bind_direct(domain, pod, &[a, b, c]);  // SparseSegmented
}

Bundle

p = hv.bundle(hv.Seed128(10, 1), a, b, c)

p := hv.Bundle(hv.NewSeed128(10, 1), a, b, c)

#![allow(unused)]
fn main() {
let p = operators::bundle(Seed128::new(10, 1), vec![a, b]);
}

Customizing runtime behavior

Environment Variables

All environment variables are read once on first access and cannot be changed at runtime. Unset variables use the documented default.

`KONGMING_RNG`

Selects the pseudo-random number generator backend used for hypervector generation.

Value	Description
`xoshiro++` (default)	xoshiro256++ — simple, fast, cross-language deterministic
`pcg`	PCG-DXSM — classic/compat mode (matches pre-v3.7.5 behavior)

Changing this affects all generated vectors: Sparkle offsets, Learner bundling, Cyclone patterns. Vectors generated with different backends are not comparable.

`KONGMING_REPR_FORMAT`

Controls __repr__() / Repr() output format.

Value	Description
`YAML` (default)	Multi-line YAML dump
`PROTO`	Multi-line protobuf debug string

`KONGMING_LEARNER_SAMPLING`

Controls the bundling strategy used by Learner.

Value	Description
`FISHER_YATES` (default)	Fisher-Yates shuffle — selects exactly the right number of segments per round
`CLASSIC`	Per-segment probabilistic sampling — each segment is independently sampled with a fixed probability

# Example: use PCG for backward compatibility with pre-v3.7.5 vectors
export KONGMING_RNG=pcg

# Example: switch repr to protobuf debug format
export KONGMING_REPR_FORMAT=PROTO

# Example: use classic sampling in Learner
export KONGMING_LEARNER_SAMPLING=CLASSIC

Querying the Current Environment

Use global_env() to inspect all active settings at runtime. Returns a GlobalEnv protobuf message — new fields added to the proto automatically appear.

>>> hv.global_env()
rng_hint: XOSHIRO256PP
learner_sampling: FISHER_YATES
repr_format: YAML

Misc

Display

All HyperBinary types have a compact, emoji-prefixed string representation for quick visual inspection. See HyperBinary Types for type symbols and Sparkle for field labels.

Python `str` and `repr`

__str__ (triggered by print()) returns the compact emoji form:

>>> a = hv.Sparkle.with_word(hv.MODEL_64K_8BIT, hv.d0(), "hello")
>>> print(a)
✨:🌐0x..c862,🫛0x..80e4

__repr__ (triggered by evaluating a variable in the shell or notebook) returns a detailed, developer-friendly YAML representation, controlled by the KONGMING_REPR_FORMAT environment variable:

>>> a
hint: SPARKLE
model: MODEL_64K_8BIT
stable_hash: 12345678
domain:
  id: ...
pod:
  seed: 12345

Set KONGMING_REPR_FORMAT=PROTO for protobuf debug output instead of the default YAML. See Environment Variables for all supported variables.

Go / Rust Display

print(sparkle)      # compact emoji form via __str__
repr(sparkle)       # detailed YAML/proto form via __repr__

// Compact emoji form
fmt.Println(sparkle)          // ✨:🌐0x..c862,🫛0x..80e4

// Detailed YAML/proto form (controlled by KONGMING_REPR_FORMAT env)
fmt.Println(sparkle.Repr())

#![allow(unused)]
fn main() {
// Compact emoji form (via Display trait)
println!("{}", sparkle);      // ✨:🌐0x..c862,🫛0x..80e4
}

Serialization

# HyperBinary → protobuf bytes
msg = hv.to_message(sparkle)

# protobuf bytes → HyperBinary
obj = hv.from_message(msg)

# raw proto bytes → HyperBinary
obj = hv.from_proto_bytes(data)

# proto bytes → YAML string (for debugging)
hv.format_to_yaml(data)

// HyperBinary → proto
pb, err := sparkle.ToProto(ctx)

// YAML formatting
yaml := hv.FormatToYaml(protoMsg)

#![allow(unused)]
fn main() {
// HyperBinary → proto
let pb = sparkle.to_proto();

// proto → HyperBinary
let sparkle = Sparkle::from_proto(&pb)?;
}

Memory

The memory package provides persistent and in-memory storage for hypervectors with semantic indexing and near-neighbor search.

The core abstraction is a Chunk — an immutable identity (Sparkle) paired with a mutable semantic code (any HyperBinary). Chunks are stored in a Substrate (pluggable storage backend), queried via ChunkSelectors, and created via ChunkProducers.

Section	Description
Chunk	The fundamental storage unit
Substrate & Views	Storage backends and transactional views
Selectors	Query builders for reading chunks
Producers	Write builders for creating chunks

Chunk

The fundamental storage unit in the memory system. A Chunk carries a semantic code (any HyperBinary type) along with its derived identity (a Sparkle as implied from the code’s domain/pod).

The identity determines the storage key and drives compositionality — a chunk is either present or absent. The code is potentially learnable, offering opportunities to adapt over time, just like weights from traditional neural nets.

Structure

Field	Type	Description
`code`	HyperBinary	Semantic content (can be updated). Required — its domain/pod determines the chunk’s identity.
`id`	Sparkle	identity vector, as derived from `code`’s domain/pod; determines the storage key.
`note`	string	Human-readable annotation, primarily for debugging
`extra`	protobuf Any	Extensible payload for application-specific data, primarily for debugging

Inspection

Chunks are typically created via producers (see Producers), but can be inspected after retrieval (see Selectors).

# chunk = memory.first_picked_chunk(view, memory.by_item_key("animals", "cat"))

chunk.id        # Sparkle
chunk.code      # HyperBinary
chunk.note      # str
chunk.extra     # Optional[bytes]

Substrate & Views

A Substrate is a pluggable storage backend. It provides transactional views for reading and writing chunks.

View Pattern

All storage access goes through views:

SubstrateView — read-only, supports key lookup and prefix scanning
SubstrateMutableView — extends SubstrateView with write staging and atomic commit (to underlying storage)

# Read-only view (context manager)
with storage.new_view() as view:
    # Check if chunk exists, without actually reading it back.
    exists = view.chunk_exists("animals", "cat")

    cat_chunk = view.read_chunk("animals", "cat")

# Mutable view (auto-commits on clean exit, rollback on exception).
# Stage writes by running producers against the view via
# producer.produce(view) — the recommended path for batched writes.
with storage.new_mutable_view() as view:
    memory.new_terminal("words", "hi").produce(view)
    memory.from_sequence_members("words", "greet", members,
                                  semantic_indexing=True).produce(view)
    # commits automatically

Storage Backends

InMemory

Volatile, in-process storage. All data lost on exit. Best for testing and ephemeral caches.

storage = memory.InMemory(hv.MODEL_64K_8BIT, "my_store")

Embedded

Persistent, single-machine storage backed by an embedded key-value store. Suitable for local development and moderate-scale deployments.

storage = memory.Embedded(hv.MODEL_64K_8BIT, "/path/to/store")

ScyllaDB (Distributed)

Distributed storage via Cassandra-compatible ScyllaDB. For high-scale, multi-node deployments.

# Not exposed yet...

Selectors

ChunkSelectors are composable query builders for reading chunks from the substrate. Each selector defines how to locate and return matching chunks.

NNS (Near-Neighbor Search)

Wraps a single attractor to perform near-neighbor search. For multiple attractors, compose them with joiner(...) first.

result = memory.first_picked(
    view, memory.nns(
        memory.set_members(memory.by_item_key("sets", "my_set"))))

Each attractor conceptually provides “the center of attraction” for candidates: the NNS accepts one or more attractors, to perform the actual near-neighbor search work, by interacting with underlying associative index.

Forward attractors

Roughly forward attractors try to find parts from a given a composite.

Attractor	Modifier	Attracts
SetMembersAttractor	depends on `selected.code.domain`	All members of the Set
SequenceMemberAttractor	depends on `selected.code.domain`	Sequence member at a specific position
TentacleAttractor(octopus, key)	`Inverse(Sparkle(model, "", key))`	Octopus value for that key

memory.set_members(memory.by_item_key("sets", "my_set"))

memory.sequence_member(memory.by_item_key("seqs", "my_seq"), pos=2)

memory.tentacle(memory.by_item_key("records", "person"), "name")

Reverse Attractors

Roughly reverse attractors try to locate composites given a part.

Attractor	Modifier	Attracts
SetAttractor(member, candidate)	`Sparkle(SET_MARKER @ candidate)`	All Sets in `candidate` containing `member`
SequenceAttractor(member, pos, candidate)	`Bind(SEQ_MARKER @ candidate, Step^pos)`	All Sequences in `candidate` with `member` at `pos`
OctopusAttractor(key, value)	`Sparkle(model, "", key)`	Octopuses with that key/value pair

memory.set_attractor(memory.by_item_key("animals", "cat"), "sets")

memory.sequence_attractor(memory.by_item_key("animals", "cat"), 0, "seqs")

memory.octopus_attractor("color", memory.by_item_key("colors", "red"))

Analogical Reasoning

AnalogicalReasoner(dst, src, feature) performs analogical reasoning (“A is to B as C is to ?”): for each chunk c yielded by dst, it computes Bind(c.code, feature, Inverse(src)) and forwards to NNS. Model is implicit in src / feature.

Given the analogy “king is to queen as man is to ?”:

king   = hv.Sparkle(model, "role", "king")
queen  = hv.Sparkle(model, "role", "queen")
man    = hv.Sparkle(model, "role", "man")

# Analogy: "king is to queen as man is to ?"
#   src     = king   (the known source of the relationship)
#   feature = queen  (the known feature/attribute of src)
#   dst     = man    (the target; we want to find its corresponding feature)
#
# src = king (known source), feature = queen (known relation), dst = man.
# Modifier = queen ⊗ inverse(king); applied to man → "woman".
memory.nns(
    memory.analogical_reasoner(memory.with_code(man), king, queen))

Direct WithCodeModifier / WithIDModifier

For ad-hoc patterns that don’t fit a named attractor, use the primitives directly. They take a precomputed HyperBinary modifier and apply Bind(code, modifier) or Bind(id, modifier) to each yielded chunk:

memory.with_code_modifier(inner_selector, modifier_vec)
memory.with_id_modifier(inner_selector, modifier_vec)

Other Selectors

ByItemKey

Exact lookup by domain + pod.

sel = memory.by_item_key("animals", "cat")

ByItemDomain

All chunks in a given domain (prefix scan).

sel = memory.by_item_domain("animals")

WithCode / WithSparkle

Literal selector — returns a hypervector directly, no storage lookup.

sel = memory.with_code(some_hv)

sel = memory.with_sparkle("animals", "cat")

Joiner

Union of multiple selectors — returns results from each of the inner selectors.

sel = memory.joiner(
    memory.by_item_key("animals", "cat"),
    memory.by_item_key("animals", "dog"),
)

Range

Limits results to [start, start+limit). limit=0 (default) implies no limit, and iteration continue until there is no more results.

sel = memory.range_sel(
    memory.by_item_domain("animals"), start=0, limit=10)

OnlyDomain

Filters inner selector results by given domain.

sel = memory.only_domain(
    "animals", inner_selector)

Working with Results

FirstPicked — get the first match

Returns the first chunk matching the selector. Returns an error if nothing is found.

# Returns the first matching Chunk (with .id, .code, .note, .extra)
chunk = memory.first_picked(view, selector)
print(chunk.id, chunk.code, chunk.note)

mem_get — eager batch read (Chunks only)

Returns every match as a list[Chunk]. No extras — any per-result SelectorExtra produced by the selector (e.g. NNS scores) is discarded. Use this when you only need the Chunks.

chunks = storage.mem_get(selector)        # list[Chunk]
for chunk in chunks:
    print(chunk.id, chunk.note)

lazy_selector_iter — stream Chunks with extras

Yields (Chunk, Optional[SelectorExtra]) tuples one at a time. This is the only way to access per-result SelectorExtra in Python; mem_get drops it.

# Streaming — useful for large result sets or early termination
for chunk, extra in memory.lazy_selector_iter(view, selector):
    print(chunk.id, extra)
    if done():
        break

# Eager with extras — wrap in list()
results = list(memory.lazy_selector_iter(view, selector))
# results: list[tuple[Chunk, Optional[SelectorExtra]]]

Producers

ChunkProducers are write builders that create and persist chunks in the substrate. Each producer encapsulates the logic for constructing a specific type of chunk.

Note

Some producers only update existing chunks (e.g., ClusterUpdater) without creating new ones. In those cases, Produce returns the updated chunk rather than a newly created one.

Producer Options

Producer options are additional information supplied to producer constructor to tweak behavior.

# `note` indiciates additional note for the new terminal chunk.
memory.new_terminal("d", "p", note="annotation")

# `semantic_indexing` indicates we need to index the semantic code 
# (on top of the id vector).
memory.from_set_members("d", "p", members, semantic_indexing=True)

Concrete Producers

NewTerminal

Creates a chunk whose code equals its identity (a bare Sparkle). Useful for registering atoms/symbols.

with storage.new_mutable_view() as view:
    memory.mem_set(view, memory.new_terminal("fruits", "apple", note="an apple"))

NewLearner

Creates a fresh Learner chunk for online learning.

with storage.new_mutable_view() as view:
    memory.mem_set(view, memory.new_learner("learners", "my_learner", note="a learner"))

FromSetMembers

Creates a Set from stored members.

with storage.new_mutable_view() as view:
    memory.mem_set(
        view, 
        memory.from_set_members(
            "sets",
            "fruit_set", 
            memory.by_item_domain("fruits"),
        ))

FromSequenceMembers

Creates a Sequence from stored members with positional encoding.

with storage.new_mutable_view() as view:
    memory.mem_set(
        view, 
        memory.from_sequence_members(
            "seqs",
            "greeting", 
            memory.joiner(
                memory.by_item_key("words", "hello"),
                memory.by_item_key("words", "world"),
            ),
            start=0),
    )

FromKeyValues

Creates an Octopus (key-value composite) from keys and value selectors.

with storage.new_mutable_view() as view:
    memory.mem_set(
        view, 
        memory.from_key_values(
            "records",
            "obj1", 
            keys=["color", "shape"], 
            values=memory.joiner(
                memory.by_item_key("colors", "red"),
                memory.by_item_key("shapes", "circle"),
            ),
        ))

FromSourceDest

Creates a Pointer 👉 chunk — a directional reference from a source chunk to a dest chunk. Both selectors must resolve to a single chunk; the produced Pointer’s bit-level value is source.id ⊗ Inv(dest.id).

with storage.new_mutable_view() as view:
    memory.mem_set(
        view,
        memory.from_source_dest(
            "edges", "earth_to_moon",
            memory.by_item_key("planets", "earth"),
            memory.by_item_key("planets", "moon"),
        ))

memory.FromSourceDest(
    hv.NewDomain("edges"), hv.NewPodFromWord("earth_to_moon"),
    memory.WithChunks(earth),
    memory.WithChunks(moon),
).Produce(ctx, view)

#![allow(unused)]
fn main() {
let producer = producers::from_source_dest(
    Domain::from_name("edges"),
    Pod::from_word("earth_to_moon"),
    Box::new(selector_impls::with_chunks(vec![earth])),
    Box::new(selector_impls::with_chunks(vec![moon])),
    args,
);
}

ClusterUpdater

Feeds an observed chunk into an existing Learner, updating its accumulated code via bundling. The bundle multiplier defaults to 1; pass an explicit override (multiple=N in Python) to fold the same observation in repeatedly.

with storage.new_mutable_view() as view:
    # With explicit multiplier:
    memory.mem_set(view,
        memory.cluster_updater(
            learner=memory.by_item_key("learners", "my_learner"),
            observed=memory.by_item_key("fruits", "apple"),
            multiple=3,
        ))

Train 🚂🚃🚃🚃

A doubly-linked, payload-carrying chunk data structure built on Octopus chunks. The train is a train of carriages — each carriage 🚃 carries one payload chunk, and carriages couple to their neighbors at LEFT ⬅️ / RIGHT ➡️.

Shape

A train occupies its own 64-bit train_domain. The structure has two kinds of chunks:

Sentinel locomotive 🚂 — the chunk at (train_domain, P0). Anchors both ends of the train. Its LEFT slot points at the leftmost carriage, its RIGHT slot at the rightmost. Its PAYLOAD slot is identity (no payload).
Carriage 🚃 — a chunk at (train_domain, carriage_pod) for some non-zero pod. Each carriage is a specialized Octopus with four slots:
- PAYLOAD 📨 — points at the actual payload chunk in item memory (Sparkle(payload_domain, payload_pod)). Always non-identity.
- LEFT ⬅️ — id-Sparkle of the previous carriage, or identity if leftmost.
- RIGHT ➡️ — id-Sparkle of the next carriage, or identity if rightmost.
- CHILDREN 👨‍👩‍👧‍👦 — optional pointer at a child structure (typically the sentinel of a child train). Identity when no children are attached.

A 3-carriage train (payloads A, B, C in left-to-right order):

       🚂                🚃                🚃                🚃
   ┌──────────┐      ┌──────────┐      ┌──────────┐      ┌──────────┐
   │ sentinel │ <--> │    A     │ <--> │    B     │ <--> │    C     │
   │  pod=P0  │      │ pod=p_A  │      │ pod=p_B  │      │ pod=p_C  │
   └──────────┘      └──────────┘      └──────────┘      └──────────┘

Each box is one Octopus chunk. Each <--> between adjacent boxes represents one bidirectional LEFT/RIGHT pointer pair. The slot values inside each chunk are:

Chunk	`PAYLOAD`	`LEFT`	`RIGHT`	`CHILDREN`
sentinel 🚂 (`pod=P0`)	identity ⊥	leftmost (A)	rightmost (C)	identity ⊥ (or a child structure)
carriage A	`Sparkle(A*)`	identity ⊥	carriage B	identity ⊥ (or a child structure)
carriage B	`Sparkle(B*)`	carriage A	carriage C	identity ⊥ (or a child structure)
carriage C	`Sparkle(C*)`	carriage B	identity ⊥	identity ⊥ (or a child structure)

Where A*, B*, C* are the actual payload chunks living elsewhere in item memory.

The sentinel’s LEFT/RIGHT are anchor pointers (jumps directly to the leftmost / rightmost), not neighbor pointers. The carriages’ LEFT/RIGHT are conventional neighbor pointers.

Pushing

storage.mem_set(
    memory.train_appender(
        train_domain, memory.with_sparkle(payload_domain, "A")))

storage.mem_set(
    memory.train_prepender(
        train_domain, memory.with_sparkle(payload_domain, "Z")))

p.Memory.Set(ctx, memory.TrainAppender(trainDomain, payloadSelector))
p.Memory.Set(ctx, memory.TrainPrepender(trainDomain, payloadSelector))

#![allow(unused)]
fn main() {
storage.set(producers::train_appender(train_domain, payload_selector, args))?;
storage.set(producers::train_prepender(train_domain, payload_selector, args))?;
}

Each push rewrites up to three chunks atomically through one mutable view:

The new carriage at (train_domain, fresh_pod).
The previous tail/head carriage, with its tail-side neighbor link updated to the new carriage. (Skipped on the first push, when the train is empty.)
The sentinel, with its tail-side anchor pointer updated to the new carriage. The head-side pointer is updated only when the train transitions from empty to one element.

The sentinel is auto-created on the first push — no separate “init train” step is needed.

⚠️ Single-writer per train. Concurrent pushes from independent mutable views on the same train WILL race on the sentinel and on the previous tail/head — last-write wins, with dropped pushes possible. Callers expecting concurrent appenders must synchronize externally (e.g., serialize through a single goroutine/task, or take an external lock keyed on train_domain). Reads through a View are always snapshot-consistent per chunk.

Mid-train insertion: `TrainInsertAfter` / `TrainInsertBefore`

TrainAppender and TrainPrepender are thin wrappers over the more general TrainInsertAfter / TrainInsertBefore, which take a reference carriage selector instead of a domain. The new carriage is wedged on the chosen side of the reference. The train domain is taken from the resolved reference carriage.

Reference carriage	`TrainInsertAfter`	`TrainInsertBefore`
sentinel 🚂 (`TrainCarriage(domain)`)	append (new rightmost)	prepend (new leftmost)
member carriage X	wedge between X and X.right	wedge between X.left and X

Sentinel-as-reference is the wraparound bookend case: since the sentinel sits on both ends of the train, “after the sentinel” means “right of the rightmost,” and “before the sentinel” means “left of the leftmost.” On an empty train, either becomes the first and only member.

Each insert rewrites:

The new carriage.
The carriage on each side whose neighbor link points at the new one (one rewrite for end-inserts, two for mid-train inserts).
The sentinel, only when its anchor pointer changes (i.e., the new carriage becomes the leftmost or rightmost).

# Mid-train: wedge M between B and C.
storage.mem_set(memory.train_insert_after(
    memory.train_carriage(train_domain, b_pod),
    memory.with_sparkle(payload_domain, "M"),
))

# Sentinel-as-ref: equivalent to train_appender.
storage.mem_set(memory.train_insert_after(
    memory.train_carriage(train_domain),
    memory.with_sparkle(payload_domain, "tail"),
))

p.Memory.Set(ctx, memory.TrainInsertAfter(
    memory.TrainCarriage(trainDomain, &bPod), payloadSelector))

p.Memory.Set(ctx, memory.TrainInsertBefore(
    memory.TrainCarriage(trainDomain, nil), payloadSelector))

#![allow(unused)]
fn main() {
storage.set(producers::train_insert_after(
    Box::new(train_carriage(train_domain.clone(), Some(b_pod))),
    payload_selector,
    args,
))?;
}

Empty check: `TrainIsEmpty`

A direct query that reads the sentinel and reports whether both LEFT and RIGHT anchors are identity. A train whose sentinel hasn’t been written yet (never touched) is also reported empty — no auto-init.

with storage.new_view() as view:
    if memory.train_is_empty(view, train_domain):
        print("nothing to process")

view := p.Memory.Substrate().NewView(nil)
defer view.Discard()
empty, err := memory.TrainIsEmpty(ctx, view, trainDomain)

#![allow(unused)]
fn main() {
let view = substrate.new_view(None);
let empty = memory::train::train_is_empty(&*view, &train_domain)?;
}

Iterating — cursor-straddle semantics

TrainForward / TrainBackward walk the train, yielding each carriage’s payload chunk. The starting point is itself a Selector — pass TrainCarriage(domain) (with no pod) to start from the sentinel for a full-train iteration, or TrainCarriage(domain, pod) to start from a specific carriage.

The cursor straddles between elements, similar to Java’s ListIterator or C++’s std::list::iterator:

start is the cursor; iteration is exclusive of start’s own payload.
When start resolves to the sentinel, iteration covers the whole train (sentinel acts as both before-leftmost and after-rightmost).
When start resolves to a member carriage, iteration yields the elements strictly after it in the chosen direction; the start carriage’s own payload is NOT yielded.

`start`	`TrainForward` yields	`TrainBackward` yields
sentinel	leftmost, …, rightmost	rightmost, …, leftmost
carriage X	X.right, X.right.right, …	X.left, X.left.left, …

# Whole-train iteration from the sentinel.
for chunk in storage.mem_get(
    memory.train_forward(memory.train_carriage(train_domain))):
    print(chunk.note)

# From a specific carriage onward (exclusive of that carriage).
for chunk in storage.mem_get(
    memory.train_backward(memory.train_carriage(train_domain, carriage_pod))):
    print(chunk.note)

// Whole-train iteration.
for c := range memory.SelectorIter(ctx, view,
    memory.TrainForward(memory.TrainCarriage(trainDomain, nil))) {
    fmt.Println(c.Note)
}

// From a specific carriage onward (exclusive).
for c := range memory.SelectorIter(ctx, view,
    memory.TrainBackward(memory.TrainCarriage(trainDomain, &carriagePod))) {
    fmt.Println(c.Note)
}

#![allow(unused)]
fn main() {
let sel = train_forward(Box::new(train_carriage(train_domain.clone(), None)));
let sel = train_backward(Box::new(train_carriage(
    train_domain.clone(),
    Some(carriage_pod),
)));
}

Single-step shorthands: `TrainNext` / `TrainPrev`

TrainNext(start) is shorthand for Range(0, 1, TrainForward(start)) — the next single payload from the cursor’s position. TrainPrev(start) is the backward mirror. They follow the same exclusive-start rule.

Cursor	`TrainNext` yields	`TrainPrev` yields
sentinel	leftmost carriage’s payload	rightmost carriage’s payload
carriage X	X’s right neighbor’s payload (or empty if X is rightmost)	X’s left neighbor’s payload (or empty if X is leftmost)

# Peek at the front / back of the train.
front = next(iter(storage.mem_get(
    memory.train_next(memory.train_carriage(train_domain))))).note
back  = next(iter(storage.mem_get(
    memory.train_prev(memory.train_carriage(train_domain))))).note

Iteration rule (implementation)

Each step:

Integrity — sentinel is identified by pod == P0. The sentinel MUST have an identity PAYLOAD; a carriage MUST have a non-identity PAYLOAD. Either violation returns FailedPrecondition (the train is malformed).
Advance — on a carriage, follow the matching-direction key (RIGHT for forward, LEFT for backward) to the next neighbor. On the sentinel, follow the opposite-direction key — sentinel’s LEFT/RIGHT are anchor pointers, so the opposite key takes you to the appropriate end.
Yield — yield the chunk you advanced to. Stop when the next pointer is identity.

The advance-then-yield order is what makes the iteration exclusive of start.

Hierarchy: the `CHILDREN` slot

Each carriage’s CHILDREN 👨‍👩‍👧‍👦 slot can point at the sentinel of another train (or any other chunk). Attach one at push time via the children argument; query it via the TrainChildren(parent) selector.

# Push a parent carriage whose CHILDREN points at a child train's sentinel.
storage.mem_set(memory.train_appender(
    parent_domain,
    memory.with_sparkle(payload_domain, "B"),
    children=memory.train_carriage(child_domain),
))

# Walk the child train from the parent.
for chunk in storage.mem_get(
    memory.train_forward(
        memory.train_children(
            memory.train_carriage(parent_domain, parent_carriage_pod)))):
    print(chunk.note)

// Push parent carriage with CHILDREN attached.
p.Memory.Set(ctx, memory.TrainAppender(
    parentDomain, payloadSelector,
    memory.PChildren(memory.TrainCarriage(childDomain, nil)),
))

// Walk the child train.
for c := range memory.SelectorIter(ctx, view,
    memory.TrainForward(memory.TrainChildren(
        memory.TrainCarriage(parentDomain, &parentCarriagePod)))) {
    fmt.Println(c.Note)
}

#![allow(unused)]
fn main() {
// Build args with children attached.
let mut args = chunk_producer_proto::Args::default();
args.children = Some(Box::new(train_carriage(child_domain, None).to_proto()?));
storage.set(producers::train_appender(parent_domain, payload_sel, args))?;

// Walk the child train.
let sel = train_forward(Box::new(train_children(Box::new(
    train_carriage(parent_domain, Some(parent_carriage_pod)),
))));
}

TrainChildren(parent) yields zero chunks (no error) when the parent has identity CHILDREN or is missing entirely. Hierarchical traversal is the caller’s job — walk a parent train, recurse into TrainChildren on each carriage.

Notebook Quick Start

This guide walks through using Kongming HV in a Jupyter notebook, cell by cell.

Section	Description
Notebook Platforms	Setup differences between Jupyter, Colab, and Binder
Interactive Notebooks	Links to existing notebooks
Walkthrough	Step-by-step: vocabulary, similarity, learning, binding

Tips

Reproducibility: Use fixed seeds in SparseOperation for deterministic results across reruns.
Visualization: Use pandas DataFrames for overlap matrices — they render nicely in Jupyter.
Performance: The Rust backend is fast. Building 10,000 vectors takes under a second on MODEL_64K_8BIT.
Model choice: Start with MODEL_64K_8BIT for exploration. Switch to MODEL_1M_10BIT or larger for production workloads.

Notebook Platforms

Setup and behavior differ across Jupyter, Google Colab, and Binder. This page covers the key differences.

Try Online

Notebook	Platform	Link
`first.ipynb`	Google Colab
`first.ipynb`	Binder
`memory.ipynb`	Google Colab
`lisp.ipynb`	Google Colab

Comparison

	Jupyter (local)	Google Colab	Binder
Account	None	Google account required	None
Install	`pip install` in terminal beforehand	`!pip install` in first cell	Pre-installed via `requirements.txt`
Restart needed	No	Yes — after first install	No
Startup time	Instant	Fast (~5s)	Slow (2–5 min cold start)
Persistence	Local filesystem	Google Drive (optional mount)	Ephemeral — lost on timeout
GPU	If available locally	Free tier available	Not available
Custom packages	Full control	`!pip install` per session	Via `requirements.txt` only

Jupyter (Local)

Install once in your terminal, then use in any notebook:

pip install kongming-rs-hv

# Cell 1 — no restart needed
from kongming import hv

For development workflows with frequent code changes, use autoreload:

%load_ext autoreload
%autoreload 2

Google Colab

Colab runs in the cloud with a fresh environment each session. Install in the first cell:

# Cell 1 — install
!pip install kongming-rs-hv

After the first install, Colab requires a runtime restart:

Go to Runtime → Restart runtime (or use the button Colab shows after install)
Then run the remaining cells

# Cell 2 — after restart
from kongming import hv
model = hv.MODEL_64K_8BIT

Subsequent sessions on the same notebook will need the install cell again — Colab does not persist pip packages across sessions.

Saving work: Use google.colab.drive to mount Google Drive for persistent storage:

from google.colab import drive
drive.mount('/content/drive')
# Then use paths like /content/drive/MyDrive/...

Binder

Binder builds a Docker image from your repo’s requirements.txt and launches a Jupyter server. No account needed.

First launch: Takes 2–5 minutes to build the environment
Subsequent launches: Faster if the image is cached
No install needed: kongming-rs-hv is pre-installed from requirements.txt
Ephemeral: All work is lost when the session times out (~10 min idle)

# Cell 1 — works immediately, no install
from kongming import hv

Limitation

You cannot install additional packages not in requirements.txt (the environment is read-only).

Choosing a Platform

Use case	Recommended
Daily development	Jupyter (local)
Quick demo / sharing	Google Colab
Zero-setup exploration	Binder
Teaching / workshops	Google Colab (students have accounts)
Persistent storage needed	Jupyter (local) or Colab + Drive

Interactive Notebooks

For deeper walkthroughs, open these notebooks directly:

Notebook	Description	Colab
`first.ipynb`	Introduction to hypervectors, bind/bundle operations, and composites
`memory.ipynb`	In-memory and persistent storage, near-neighbor search with attractors, and export to disk
`lisp.ipynb`	VSA-based LISP interpreter where every data structure is a hypervector

See also: LISP Interpreter — a full example built on the core API.

Walkthrough

A step-by-step introduction to Kongming HV in a notebook, cell by cell.

Setup

# Cell 1: Install and import
# !pip install kongming-rs-hv pandas

from kongming import hv
import pandas as pd

model = hv.MODEL_64K_8BIT
so = hv.SparseOperation(model, 0, 1)

Building a Vocabulary

# Cell 2: Create vectors for a set of words
words = ["cat", "dog", "fish", "bird", "tree", "rock"]
vectors = {w: hv.Sparkle.from_word(model, "vocab", w) for w in words}

print(f"Created {len(vectors)} vectors")
print(f"Model: {model}, Cardinality: {hv.cardinality(model)}")

Output:

Created 6 vectors
Model: 1, Cardinality: 256

Similarity Matrix

# Cell 3: Compute pairwise overlap
data = {}
for w1 in words:
    data[w1] = {w2: hv.overlap(vectors[w1], vectors[w2]) for w2 in words}

pd.DataFrame(data, index=words)

Output:

	cat	dog	fish	bird	tree	rock
cat	256	1	0	2	1	1
dog	1	256	1	0	1	2
fish	0	1	256	1	0	1
bird	2	0	1	256	1	0
tree	1	1	0	1	256	1
rock	1	2	1	0	1	256

The diagonal is 256 (cardinality = perfect self-overlap). Off-diagonal values are near 0-2 (random noise), confirming the vectors are near-orthogonal.

Learning from Observations

# Cell 4: Create a learner and feed it observations
learner = hv.Learner(model, hv.Seed128(0, so.uint64()))

# "cat" seen 3 times, "dog" once, "bird" once
for _ in range(3):
    learner.bundle(vectors["cat"])
learner.bundle(vectors["dog"])
learner.bundle(vectors["bird"])

print(f"Learner age: {learner.age()}")

Output:

Learner age: 5

Probing the Learner

# Cell 5: Check what the learner remembers
results = []
for w in words:
    ov = hv.overlap(learner, vectors[w])
    results.append({"word": w, "overlap": ov})

df = pd.DataFrame(results).sort_values("overlap", ascending=False)
df

Output:

word	overlap
cat	~75
dog	~30
bird	~30
fish	~1
tree	~1
rock	~1

“cat” has the highest overlap (seen 3x). “dog” and “bird” (seen 1x each) have moderate overlap. Unseen words are at noise level (~1).

Binding: Role-Filler Pairs

# Cell 6: Create a structured representation
#   "a cat that is red"
color_role = hv.Sparkle.from_word(model, "role", "color")
animal_role = hv.Sparkle.from_word(model, "role", "animal")

red = hv.Sparkle.from_word(model, "color", "red")
blue = hv.Sparkle.from_word(model, "color", "blue")
cat = vectors["cat"]

# Bind role with filler, then bundle the pairs
learner2 = hv.Learner(model, hv.Seed128(0, so.uint64()))
learner2.bundle(hv.Sparkle.bind(color_role, red))
learner2.bundle(hv.Sparkle.bind(animal_role, cat))

# Probe: "what color?"
query = hv.Sparkle.bind(learner2, color_role.power(-1))
print(f"red overlap:  {hv.overlap(query, red)}")   # high
print(f"blue overlap: {hv.overlap(query, blue)}")   # ~1
print(f"cat overlap:  {hv.overlap(query, cat)}")    # ~1

Python Quick Start

Section	Description
Installation	PyPI install, supported platforms, import paths
Quick Example	Minimal code showing bind, bundle, and overlap
Walkthrough	Vectors, similarity, random generation, power, learning

See also: Notebook Quick Start for interactive Jupyter walkthroughs.

Installation

PyPI

pip install kongming-rs-hv

Supported Platforms

Platform	Architectures	Python Versions
Linux	x86_64	3.10–3.14
macOS	Apple Silicon & Intel	3.10–3.14
Windows	x86_64	3.10–3.14

Verifying Installation

import kongming
print(kongming.__version__)  # e.g. should be "3.6.5", as of Apr. 2026. Yours should be newer. 

from kongming import hv
print(hv.MODEL_64K_8BIT)  # should print 1

Import Paths

The package exposes two main modules:

from kongming import hv       # hypervector operations
from kongming import memory   # storage and selectors

Model constants are available directly on hv:

hv.MODEL_64K_8BIT      # 1
hv.MODEL_1M_10BIT      # 2
hv.MODEL_16M_12BIT     # 3
hv.MODEL_256M_14BIT    # 4
hv.MODEL_4G_16BIT      # 5

Docker

If you’d rather not install anything on your host, you can run kongming-rs-hv inside a container. This works on any system with Docker — no Python, no virtualenv, no wheel compatibility to worry about.

One-liner: throwaway Python REPL

Drop straight into a Python shell with the package preinstalled:

docker run --rm -it python:3.12-slim sh -c "\
    pip install --quiet --root-user-action=ignore \
        --disable-pip-version-check kongming-rs-hv && python"

--rm removes the container on exit. Nothing is persisted. Re-running reinstalls from PyPI, which takes a few seconds. The --root-user-action=ignore and --disable-pip-version-check flags silence pip’s root-user and upgrade notices, which are harmless inside a throwaway container.

Reusable image

For repeat use, build a small image once:

# Dockerfile
FROM python:3.12-slim
RUN pip install --no-cache-dir --disable-pip-version-check kongming-rs-hv
CMD ["python"]

docker build -t kongming-hv .
docker run --rm -it kongming-hv

To run a script from the host instead of an interactive REPL, mount the current directory:

docker run --rm -v "$PWD":/work -w /work kongming-hv python my_script.py

JupyterLab in a container

For interactive exploration with notebooks:

# Dockerfile.jupyter
FROM python:3.12-slim
RUN pip install --no-cache-dir --disable-pip-version-check \
    kongming-rs-hv jupyterlab
WORKDIR /notebooks
EXPOSE 8888
CMD ["jupyter", "lab", "--ip=0.0.0.0", "--no-browser", \
     "--ServerApp.token=''", "--ServerApp.password=''"]

docker build -f Dockerfile.jupyter -t kongming-hv-jupyter .
docker run --rm -p 8888:8888 -v "$PWD":/notebooks kongming-hv-jupyter

Open http://localhost:8888 in your browser. Notebooks saved under /notebooks are persisted to the mounted host directory.

The disabled token/password above is fine for local use. Do not expose this container on a public network without adding authentication.

Quick Example

A minimal example showing the core operations:

from kongming import hv

# Create hypervectors
a = hv.Sparkle.from_word(hv.MODEL_64K_8BIT, hv.d0(), "hello")
b = hv.Sparkle.from_word(hv.MODEL_64K_8BIT, hv.d0(), "world")
print(f'Overlap: {hv.overlap(a, b)}')  # Near orthogonal (~1)

# Bind: result is dissimilar to both inputs
bound = hv.bind(a, b)
print(f'{hv.overlap(bound, a)=}, {hv.overlap(bound, b)=}')  # ~1, ~1

# Bundle: result is similar to both inputs
bundled = hv.bundle(hv.Seed128(10, 1), a, b)
print(f'{hv.overlap(bundled, a)=}, {hv.overlap(bundled, b)=}')  # high, high

What’s Happening

Sparkle.from_word generates a deterministic hypervector from a word. Same word always produces the same vector.
Two unrelated vectors have near-zero overlap (~1) — random high-dimensional vectors are nearly orthogonal.
hv.bind(a, b) produces a vector dissimilar to both (low overlap). Binding is reversible.
hv.bundle(seed, a, b) produces a vector similar to both (high overlap). Different seeds produce different but equally valid results.

Walkthrough

A deeper exploration of the Python API, covering vector creation, similarity, random generation, power/permutation, and online learning.

Creating Vectors

from kongming import hv

model = hv.MODEL_64K_8BIT

# Create sparkles (atomic vectors) from words
cat = hv.Sparkle.from_word(model, "animals", "cat")
dog = hv.Sparkle.from_word(model, "animals", "dog")

# Same inputs always produce the same vector
cat2 = hv.Sparkle.from_word(model, "animals", "cat")
assert cat.stable_hash() == cat2.stable_hash()

Measuring Similarity

# Random vectors have ~1 overlap
print(hv.overlap(cat, dog))   # ≈ 1 (near-orthogonal)

# A vector is maximally similar to itself
print(hv.overlap(cat, cat))   # = 256 (= cardinality)

Using SparseOperation for Random Generation

so = hv.SparseOperation(model, 123, 456)

# Generate random sparkles
a = hv.Sparkle.random("my_domain", so)
b = hv.Sparkle.random("my_domain", so)

# Each call to so produces a new random seed
print(hv.overlap(a, b))  # ≈ 1

Power and Permutation

# Power creates a permuted vector
s = hv.Sparkle.from_word(model, "pos", "step")
s2 = s.power(2)
s3 = s.power(3)

# Different powers are near-orthogonal
print(hv.overlap(s, s2))   # ≈ 1
print(hv.overlap(s, s3))   # ≈ 1

# Inverse: power(-1) undoes power(1)
s_inv = s.power(-1)
# bind(s, s_inv) ≈ identity

Online Learning with Learner

learner = hv.Learner(model, hv.Seed128(0, 42))

# Feed observations one at a time
learner.bundle(cat)
learner.bundle(cat)   # seen twice — stronger signal
learner.bundle(dog)

# The learned vector is more similar to cat (seen 2x)
print(hv.overlap(learner, cat))  # higher
print(hv.overlap(learner, dog))  # lower but above random

Examples

Standalone runnable scripts under examples/ — each demonstrates a different facet of hypervector computing. Click through for the walkthrough.

Example	What it shows
Mexican Dollar	Analogical reasoning of “What’s the Dollar of Mexico?”: bind/bundle as the math behind analogy.
Word Indexer	Encoding and novel queries for 5,000 English words.
Bulk Storage Benchmark	Populate various substrates with thousands of chunks and measure retrieval performance.
Operators from Scratch	Reimplement `bind` and `bundle` in pure Python — the core math underneath the library.
LISP Interpreter	A full LISP where every atom, cons cell, and environment is a hypervector. For the VSA-curious.

Operators from Scratch

Standalone script: operators.py

This example implements the bind, release, and bundle operators in pure Python using only the low-level offset API, then verifies correctness against the library’s built-in implementations.

The script does not call hv.bind(), hv.release(), or hv.bundle() for computation — it reimplements them to show how they work at the offset level.

Bind

Per-segment offset addition modulo segment size:

for seg in range(cardinality):
    result[seg] = (core_a.offset(seg) + core_b.offset(seg)) % segment_size

Properties:

Result is nearly orthogonal to both inputs (overlap ≈ 1)
Commutative: bind(a, b) == bind(b, a)
Associative: bind(a, b, c) == bind(bind(a, b), c)

Release (Unbind)

Per-segment offset subtraction modulo segment size:

for seg in range(cardinality):
    result[seg] = (core_c.offset(seg) - core_k.offset(seg)) % segment_size

Properties:

release(bind(a, b), b) = a (exact recovery)
Multi-release: release(release(bind(a, b, c), c), b) = a

Bundle

PRNG-based random selection among inputs. For each segment, a seeded PRNG picks which input vector contributes its offset. The selection probability is proportional to each input’s weight.

# Compute cumulative anchors from weights (weights sum to 1.0).
# For equal weights [0.33, 0.33, 0.33]: anchors ≈ [21845, 43690, 65535]
# For weighted [0.6, 0.2, 0.2]:        anchors ≈ [39321, 52428, 65535]
cumulative = 0.0
anchors = []
for w in weights:
    cumulative += w
    anchors.append(int(cumulative * 65535))

for seg in range(0, cardinality, 4):
    r = so.uint64()                          # one PRNG call → 4 × 16-bit values
    for j in range(4):
        dial = (r >> (48 - 16 * j)) & 0xFFFF # extract 16-bit random value
        chosen = first input whose anchor >= dial
        result[seg + j] = cores[chosen].offset(seg + j)

Properties:

Result is similar to all inputs (overlap ≈ weight × cardinality)
Not reversible — information is lost

Note: The library supports two bundling strategies: classic (shown above) and fisher_yates (default). To verify exact match with the pure Python implementation, set:

KONGMING_LEARNER_SAMPLING=classic python operators.py

Running

pip install kongming-rs-hv

# Classic sampling — all operators match exactly
KONGMING_LEARNER_SAMPLING=classic python operators.py

Mexican Dollar

Standalone scripts: mexican_dollar.py | mexican_dollar_memory.py

The “What’s the Dollar of Mexico?” problem is a classic demonstration of analogical reasoning with hypervectors. It shows how structured knowledge about countries can be encoded, and how algebraic operations can answer analogy questions without explicit programming.

The Problem

Given knowledge about three countries:

Country	Code	Capital	Currency
USA	USA	Washington DC	Dollar
Mexico	MEX	Mexico City	Peso
Sweden	SWE	Stockholm	Krona

We want to answer questions like:

“What is the Dollar of Mexico?” → Peso
“What is the Washington DC of Mexico?” → Mexico City
“What is the Dollar of Sweden?” → Krona

How It Works

Each country is encoded as a bundled set of role-filler bindings:

$US = \oplus \sum (code \otimes usa, capital \otimes dc, currency \otimes dollar)$

$Mexico = \oplus \sum (code \otimes mex, capital \otimes mexico_city, currency \otimes peso)$

$Sweden = \oplus \sum (code \otimes swe, capital \otimes stockholm, currency \otimes krona)$

To find “the Dollar of Mexico”, we compute a transfer vector from US to Mexico:

$T_{US \to Mexico} = Mexico ⊘ US$

Then apply it to Dollar:

$result = dollar \otimes T_{US \to Mexico}$

The result will have high overlap with Peso — the analogical answer.

The same transfer works for Sweden:

$T_{US \to Sweden} = Sweden ⊘ US$

$result = dollar \otimes T_{US \to Sweden} \approx krona$

Code (Manual)

The algebraic approach — compute the transfer vector directly:

from kongming import hv

model = hv.MODEL_64K_8BIT
so = hv.SparseOperation(model, "knowledge", 0)

# Create role markers
country_code = hv.Sparkle.from_word(model, "role", "country_code")
capital      = hv.Sparkle.from_word(model, "role", "capital")
currency     = hv.Sparkle.from_word(model, "role", "currency")

# Create fillers
usa         = hv.Sparkle.from_word(model, "country", "usa")
mex         = hv.Sparkle.from_word(model, "country", "mex")
swe         = hv.Sparkle.from_word(model, "country", "swe")
dc          = hv.Sparkle.from_word(model, "capital", "dc")
mexico_city = hv.Sparkle.from_word(model, "capital", "mexico_city")
stockholm   = hv.Sparkle.from_word(model, "capital", "stockholm")
dollar      = hv.Sparkle.from_word(model, "currency", "dollar")
peso        = hv.Sparkle.from_word(model, "currency", "peso")
krona       = hv.Sparkle.from_word(model, "currency", "krona")

# Encode each country as role-filler bundles
us_record = hv.bundle(hv.Seed128.random(so),
    hv.bind(country_code, usa),
    hv.bind(capital, dc),
    hv.bind(currency, dollar),
)
mexico_record = hv.bundle(hv.Seed128.random(so),
    hv.bind(country_code, mex),
    hv.bind(capital, mexico_city),
    hv.bind(currency, peso),
)
sweden_record = hv.bundle(hv.Seed128.random(so),
    hv.bind(country_code, swe),
    hv.bind(capital, stockholm),
    hv.bind(currency, krona),
)

# Transfer vector: Mexico / US
transfer_to_mexico = hv.release(mexico_record, us_record)

# "What's the Dollar of Mexico?"
mexican_dollar = hv.bind(dollar, transfer_to_mexico)
print(f"peso overlap:  {hv.overlap(mexican_dollar, peso)}")    # high!
print(f"dollar overlap: {hv.overlap(mexican_dollar, dollar)}")  # ~1 (noise)
print(f"krona overlap:  {hv.overlap(mexican_dollar, krona)}")   # ~1 (noise)

# "What's the Washington DC of Mexico?"
mexican_dc = hv.bind(dc, transfer_to_mexico)
print(f"mexico_city overlap: {hv.overlap(mexican_dc, mexico_city)}")  # high!

# Transfer to Sweden works the same way
transfer_to_sweden = hv.release(sweden_record, us_record)
swedish_dollar = hv.bind(dollar, transfer_to_sweden)
print(f"krona overlap: {hv.overlap(swedish_dollar, krona)}")  # high!

Code (with AnalogicalReasoner)

When records are stored in memory (as Octopus composites), analogical_reasoner handles the transfer:

from kongming import hv, memory

model = hv.MODEL_64K_8BIT
store = memory.InMemory(model)

keys = ["capital", "currency", "country_code"]

# Store individual fillers — NNS needs them as searchable items
fillers = {}
for word in ["dc", "USD", "USA", "mexicoCity", "MXN", "MEX",
             "stockholm", "SEK", "SWE"]:
    s = hv.Sparkle.from_word(model, 0, word)
    store.put(s)
    fillers[word] = s

# Store country records as Octopus composites
store.put(hv.Octopus(
    hv.Seed128("country", "USA"), keys,
    fillers["dc"], fillers["USD"], fillers["USA"],
))
store.put(hv.Octopus(
    hv.Seed128("country", "MEX"), keys,
    fillers["mexicoCity"], fillers["MXN"], fillers["MEX"],
))
store.put(hv.Octopus(
    hv.Seed128("country", "SWE"), keys,
    fillers["stockholm"], fillers["SEK"], fillers["SWE"],
))

# Retrieve stored records
us_code  = store.get("country", "USA").code
mex_code = store.get("country", "MEX").code
swe_code = store.get("country", "SWE").code

view = store.new_view()

# "What is the USD of Mexico?"
result = memory.first_picked(view,
    memory.nns(
        memory.analogical_reasoner(
            memory.with_code(mex_code),
            us_code,
            fillers["USD"],
        )
    )
)
print(result.id)  # → ✨:🌱MXN

# "What is the Washington DC of Mexico?"
result = memory.first_picked(view,
    memory.nns(
        memory.analogical_reasoner(
            memory.with_code(mex_code),
            us_code,
            fillers["dc"],
        )
    )
)
print(result.id)  # → ✨:🌱mexicoCity

# "What is the Dollar of Sweden?"
result = memory.first_picked(view,
    memory.nns(
        memory.analogical_reasoner(
            memory.with_code(swe_code),
            us_code,
            fillers["USD"],
        )
    )
)
print(result.id)  # → ✨:🌱SEK

analogical_reasoner computes the transfer vector feature ⊗ inverse(src) internally and uses near-neighbor search to find the best match in memory — no manual algebra needed.

Why It Works

The transfer vector $T = Mexico ⊘ US$ captures the structural mapping between the two records. When applied to any filler from the US record, it maps it to the corresponding filler in the Mexico record — because the role-filler binding structure is preserved by the algebra.

This is a form of analogical reasoning: no explicit rules, no lookup tables — just algebraic operations on high-dimensional vectors.

Bulk Storage Benchmark

Standalone script: bulk_storage.py

This example populates a storage with a large number of random terminal chunks, then queries a few by key to verify correctness. It demonstrates how to batch-create items and measure throughput.

Note associative index is also prepared in the process, and near-neighbor search is available immediately upon successful conclusion of all writing.

Motivated readers can further improve this script to test various producers or selectors.

Script

#!/usr/bin/env python3
"""Populate local storage with random terminal chunks and verify retrieval."""

import argparse
import shutil
import tempfile
import time
from kongming import hv, memory


def main():
    parser = argparse.ArgumentParser(description="Bulk storage benchmark")
    parser.add_argument(
        "-n", "--count", type=int, default=10_000,
        help="Number of terminal chunks to create (default: 10000)",
    )
    parser.add_argument(
        "--model", type=int, default=hv.MODEL_1M_10BIT,
        help="HV model (default: MODEL_1M_10BIT)",
    )
    parser.add_argument(
        "--domain", type=str, default="bench",
        help="Domain name for all chunks (default: bench)",
    )
    parser.add_argument(
        "--backend", type=str,
        choices=["inmemory", "embedded"],
        default="inmemory",
        help="Storage backend (default: inmemory)",
    )
    parser.add_argument(
        "--path", type=str, default=None,
        help="Disk path for embedded backend (default: temp directory)",
    )
    args = parser.parse_args()

    # --- Create storage ---
    tmpdir = None
    if args.backend == "embedded":
        if args.path:
            path = args.path
        else:
            tmpdir = tempfile.mkdtemp()
            path = f"{tmpdir}/bench_store"
        storage = memory.Embedded(args.model, path)
        print(f"Backend: Embedded (path={path})")
    else:
        storage = memory.InMemory(args.model)
        print("Backend: InMemory (BTreeMap, pure in-memory)")

    # --- Write phase ---
    print(f"Writing {args.count:,} terminal chunks …")
    t0 = time.perf_counter()
    for i in range(args.count):
        storage.mem_set(memory.new_terminal(args.domain, str(i)))

    elapsed = time.perf_counter() - t0
    rate = args.count / elapsed
    print(f"  done in {elapsed:.2f}s  ({rate:,.0f} chunks/s)")
    print(f"  item_count = {storage.item_count():,}")

    # --- Read phase: spot-check a few items ---
    spot_checks = range(0, args.count, args.count // 100)
    print(f"Spot-checking keys: {spot_checks}")
    for idx in spot_checks:
        expected = hv.Sparkle.from_word(args.model, args.domain, str(idx))
        chunk = storage.get(args.domain, str(idx))
        if not hv.equal(chunk.id, expected):
            print(f"mismatch at key {idx}: {chunk}")

    print("All checks passed.")

    # --- Cleanup ---
    if tmpdir:
        del storage
        shutil.rmtree(tmpdir)


if __name__ == "__main__":
    main()

Usage

# Default: 10K chunks, in-memory storage substrate.
python bulk_storage.py

# Embedded (disk-backed storage substrate).
python bulk_storage.py --backend embedded

# Embedded with a specific path (tip: use a tmpfs mount for near-in-memory speed)
python bulk_storage.py --backend embedded --path /dev/shm/my_bench

# Custom count
python bulk_storage.py -n 100000

# Different model, 1 implies MODEL_64K_8BIT model, etc.
python bulk_storage.py -n 10000 --model 1

Word Indexer

Standalone script: word_indexer.py

This example encodes ~5,000 English words as Sequences of per-letter Sparkles, then queries them by exact word or by positional suffix (“six-letter words ending in er”, “eleven-letter words ending in tion”) using multi-attractor near-neighbor search.

It demonstrates four ideas together:

Using Sparkle as a stable per-symbol code (one Sparkle per a–z).
Using Sequence with a Pod-derived seed so chunks are addressable both by word (exact) and by structure (positional).
The ChunkProducer API (new_terminal, from_sequence_members, joiner) staged through a batched SubstrateMutableView via producer.produce(view).
Multi-attractor nns over SequenceAttractor for positional conjunctive queries.

The general idea

letters domain                       words domain
─────────────                        ────────────
"a" → Sparkle_a                      "the"      → Sequence(t, h, e)
"b" → Sparkle_b                      "language" → Sequence(l, a, n, g, u, a, g, e)
"c" → Sparkle_c                      ...
...                                  Pod   = word         (exact lookup key)
"z" → Sparkle_z                      note  = word         (recoverable in results)
                                     members = letter Sparkles in order

Letters as Sparkles. Pre-write 26 random-looking Sparkles, one per a–z, into a letters domain via new_terminal(letters, ch). Each letter’s Pod is the letter itself, so you can fetch it by by_item_key("letters", "e").
Words as Sequences. Each word is a Sequence in a words domain whose ordered members are the letter-Sparkles spelling it, built by from_sequence_members(...) with a joiner(...) of per-letter by_item_key selectors. The Sequence’s Pod is the word, so exact lookup is by_item_key("words", "language").
note carries the word string. Each word-chunk is written with note=<word>, so chunk.note recovers the word in result loops without decoding the Pod.

Batched writes via the ChunkProducer API

This example uses the producer API end-to-end. Producers compute their chunks at produce() time against a mutable view, mirroring Go’s producer.Produce(ctx, view) and Rust’s producer.produce(view, index). Storage’s new_mutable_view() is a context manager with transactional semantics:

All writes staged by producer.produce(view) calls between __enter__ and __exit__ go into a single batch.
Clean exit auto-commits; an exception inside the block discards everything.
view.commit() mid-block flushes the current batch and lets you continue staging — useful for pacing memory pressure on large ingests.

Letters and words go into two separate views; the second commits every BATCH_SIZE = 1000 words:

with storage.new_mutable_view() as view:
    for ch in "abcdefghijklmnopqrstuvwxyz":
        memory.new_terminal("letters", ch).produce(view)
    # auto-commits on __exit__

with storage.new_mutable_view() as view:
    for i, w in enumerate(words, start=1):
        members = memory.joiner(*[memory.by_item_key("letters", ch) for ch in w])
        # semantic_indexing=True: index the Sequence's code so suffix
        # queries (sequence_attractor) can find words by structure.
        memory.from_sequence_members(
            "words", w, members, note=w, semantic_indexing=True,
        ).produce(view)
        if i % BATCH_SIZE == 0:
            view.commit()
    # trailing writes auto-commit on __exit__

See Substrate & Views for the full view API.

Multi-attractor NNS

A sequence_attractor(member_selector, pos, domain) is a positional constraint: “Sequences in domain whose member at pos overlaps with member_selector”. Position is 0-based.

nns(*attractors) evaluates all attractors and ranks Sequences by combined overlap. With multiple attractors, the result is a conjunction — a chunk must satisfy each positional constraint to score well.

For “six-letter words ending in er”:

memory.nns(
    memory.sequence_attractor(memory.by_item_key("letters", "e"), 4, WORDS_DOMAIN),
    memory.sequence_attractor(memory.by_item_key("letters", "r"), 5, WORDS_DOMAIN),
)

This returns Sequences with e at index 4 and r at index 5 — i.e., the last two characters of a six-letter word.

For “eleven-letter words ending in tion”, anchor t/i/o/n at positions 7/8/9/10.

Counting and ranged results

storage.mem_get(selector) returns the full ranked result list as a Python list. Two helpers shape the output:

Call	Use
`mem_get(nns(...))`	Get every match. `len(...)` is the count.
`mem_get(range_sel(nns(...), start, limit))`	Materialize a window — useful for top-N.

range_sel(inner, start, limit) consumes its inner selector, so to demonstrate both count and top 10 the example builds the NNS selector twice (cheap; the substrate work dominates).

See Working with Results for more on shaping selector output. When you need per-result SelectorExtra (e.g. NNS scores) or lazy iteration, reach for memory.lazy_selector_iter(view, selector) — mem_get returns Chunks only.

Running

pip install kongming-rs-hv
python examples/word_indexer/word_indexer.py

Expected output shape:

Ingested 4982 words in 8.4s.

by word 'the': 1 match(es) [0.3 ms]
   1. the

by word 'people': 1 match(es) [0.3 ms]
   1. people

****er  (6 letters): N match(es) [~190 ms]
   1. <some six-letter -er word>
   ...

*******tion (11 letters): M match(es) [~480 ms]
   1. <some eleven-letter -tion word>
   ...

Approximate timings on an Apple Silicon laptop with the InMemory backend:

Operation	Time
Ingest ~5,000 words via producer API	~9 s
Exact lookup via `by_item_key`	<1 ms
Multi-attractor NNS (2 attractors, e.g. `*****er`)	~200 ms
Multi-attractor NNS (4 attractors, e.g. `*******tion`)	~460 ms

A note on `semantic_indexing`

For NNS by composite structure (i.e. “find Sequences whose member at position N matches X”), each word’s producer is constructed with semantic_indexing=True. This impresses the Sequence’s code into the associative index alongside the chunk’s id-Sparkle (which is always indexed). Without the flag, only the id is indexed and sequence_attractor queries return zero hits.

The letter terminals are written without the flag because their code is the id-Sparkle, so id-only indexing is sufficient.

Switching to persistent storage

InMemory is fine for a demo. For a persistent store, swap one line:

storage = memory.Embedded(MODEL, "/path/to/db")

Everything else is identical.

Data attribution

Word-frequency data in top5000.txt is sourced from www.wordfrequency.info (top-5000 English words). Please credit the source when reusing this data.

Format (tab-separated, no header):

Rank    Word    POS    Frequency    Dispersion

LISP Interpreter

A LISP interpreter where every data structure — atoms, cons cells, lists, closures — is encoded as a hypervector. No traditional memory allocation, no pointers, no garbage collector. All computation happens through hypervector algebra.

Two Implementations

The LISP interpreter ships in two forms, both feature-identical:

	Pure Python (`pylisp`)	Rust (`kongming_rs.lisp`)
Source	Open-sourced in `examples/pylisp/`	Compiled into `kongming-rs-hv`
Readable	Yes — ~500 lines of annotated Python	No — compiled Rust binary
Performance	Slower (Python overhead per operation)	Faster (native code)
Import	`from pylisp import LispEnv`	`from kongming.lisp import LispEnv`
Dependencies	`kongming-rs-hv` (for hypervector primitives)	Included in `kongming-rs-hv`
Use case	Learning, debugging, extending	Production, notebooks

Rust (built-in)

The kongming-rs-hv package includes a Rust-based LISP interpreter built directly on the internal Rust API and primitives. This implementation is compiled into the Python wheel and accessible via from kongming.lisp import LispEnv.

Since it operates on Rust-native hypervector types with zero Python overhead, it delivers the best performance for production use.

Python (open-source)

For research and study, we provide a pure-Python implementation of the same interpreter, built entirely on the public Python API of kongming-rs-hv. It mirrors the Rust implementation’s architecture but uses Python-level operations (hv.bind, hv.bundle, hv.release, etc.), making the underlying hypervector mechanics fully transparent and easy to modify.

This implementation is ideal for:

Understanding how LISP primitives map to hypervector operations
Experimenting with alternative encodings or evaluation strategies
Teaching and prototyping

Quick Start

pip install kongming-rs-hv

# Pure Python
from pylisp import LispEnv

env = LispEnv()
env.eval("(CAR (QUOTE (A B C)))")       # => "A"
env.eval("(CDR (QUOTE (A B C)))")       # => "(B C)"
env.eval("(CONS (QUOTE A) (QUOTE B))")  # => "(A . B)"

# Rust (same API, same results)
from kongming.lisp import LispEnv

env = LispEnv()
env.eval("(CAR (QUOTE (A B C)))")       # => "A"

Supported Forms

McCarthy’s 7 Primitives (1960)

Form	Example	Result
`QUOTE`	`(QUOTE (A B C))`	`(A B C)`
`CAR`	`(CAR (QUOTE (A B C)))`	`A`
`CDR`	`(CDR (QUOTE (A B C)))`	`(B C)`
`CONS`	`(CONS (QUOTE A) (QUOTE B))`	`(A . B)`
`ATOM`	`(ATOM (QUOTE A))`	`T`
`EQ`	`(EQ (QUOTE A) (QUOTE A))`	`T`
`COND`	`(COND ((EQ (QUOTE A) (QUOTE B)) (QUOTE NO)) (T (QUOTE YES)))`	`YES`

Extensions

Form	Description
`LAMBDA`	Anonymous functions with curried beta-reduction and variable shadowing
`LABEL`	Recursive self-reference (enables recursion without mutation)
`DEFINE`	Bind a name to a function in the environment

Examples

# Lambda
env.eval("((LAMBDA (X) (CAR X)) (QUOTE (A B C)))")  # => "A"

# Define a reusable function
env.eval("(DEFINE SECOND (LAMBDA (L) (CAR (CDR L))))")
env.eval("(SECOND (QUOTE (X Y Z)))")                 # => "Y"

# Recursion with LABEL
env.eval(
    "(DEFINE LAST (LAMBDA (L) "
    "  ((LABEL REC (LAMBDA (X) "
    "    (COND ((ATOM (CDR X)) (CAR X)) "
    "          (T (REC (CDR X)))))) L)))"
)
env.eval_full("(LAST (QUOTE (A B C)))")              # => "C"

How It Works

Each LISP value is a Sparkle — a sparse binary hypervector seeded by its content. Atoms like A, B, CAR are sparkles in a symbol domain.

A cons cell (a . b) is encoded as:

cell = bundle(bind(a, LHS), bind(b, RHS))

where LHS and RHS are fixed tag sparkles. The cell is stored under a fresh random sparkle id. To extract:

car(id) = cleanup(release(cell, LHS))
cdr(id) = cleanup(release(cell, RHS))

The release operation is noisy — it produces an approximate result. Cleanup uses near-neighbor search (NNS) over the substrate’s associative index to find the exact stored sparkle that best matches the noisy probe.

File Structure

examples/pylisp/
  __init__.py      # Package entry point
  types.py         # HyperBinary type alias
  env.py           # LispEnv: domains, symbols, lexicon, substrate
  cons.py          # Cons cells: cons, car, cdr, cleanup via NNS
  reader.py        # S-expression tokenizer and parser
  evaluator.py     # Single-step and fixed-point evaluator
  lambda_.py       # Beta-reduction with currying and shadowing
  printer.py       # Hypervector → S-expression display
  test_pylisp.py   # 16 tests mirroring the Rust integration suite

Storage Backends

# In-memory (default, volatile)
env = LispEnv()

# Persistent (Embedded disk-backed)
env = LispEnv(path="/tmp/my_lisp_db")

Running Tests

pip install pytest kongming-rs-hv
pytest examples/pylisp/test_pylisp.py -v

Notebook

We provide a Colab notebook that runs both implementations side by side, demonstrating correctness parity and performance comparison:

References

Hey, Pentti, We Did it!: A fully Vector-Symbolic Lisp — the paper that inspired this implementation
Peter Norvig’s (How to Write a (Lisp) Interpreter (in Python)) — the original Lispy that the Python implementation builds upon

Keyboard shortcuts

Kongming HV Documentation