introductionconcepts-hypervectorsconcepts-operatorsconcepts-compositesconcepts-near_neighbor_searchapi-hv-overviewapi-hv-commonapi-hv-common-modelsapi-hv-common-domain_podapi-hv-common-seed128api-hv-common-sparse_operationapi-hv-common-utilitiesapi-hv-typesapi-hv-sparse_segmentedapi-hv-sparkleapi-hv-learnerapi-hv-setapi-hv-sequenceapi-hv-octopusapi-hv-knotapi-hv-parcelapi-hv-pointerapi-hv-operatorsapi-hv-runtimeapi-hv-miscapi-memory-overviewapi-memory-chunkapi-memory-substrateapi-memory-selectorsapi-memory-selectors-near_neighborapi-memory-selectors-attractorsapi-memory-selectors-otherapi-memory-selectors-resultsapi-memory-producersapi-memory-trainguides-notebook-quick-startguides-notebook-platformsguides-notebook-notebooksguides-notebook-walkthroughguides-python-quick-startguides-python-installationguides-python-exampleguides-python-walkthroughexamples-indexexamples-operators-indexexamples-mexican_dollar-indexexamples-bulk_storage-indexexamples-word_indexer-indexexamples-pylisp-index

Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Kongming HV

PDF

Kongming is a hyperdimensional computing library implementing sparse binary hypervectors for cognitive computing applications.

The core engine is implemented in Rust for maximum efficiency, while ergonomic APIs are open-sourced in Python for better usability.

See Hypervectors for an introduction to hyperdimensional computing and the sparse binary representation.

License

The Python source code, examples, and documentation in this repository are licensed under the MIT License.

The compiled engine distributed via PyPI (kongming-rs-hv) is proprietary.

Install

pip install kongming-rs-hv

See Installation for supported platforms and verification steps.

Published notebooks

See Notebook Platforms for all available notebooks and platform details.

Guides

GuideDescription
Python Quick StartInstallation, examples, and walkthrough
Notebook Quick StartPlatform setup, interactive notebooks, cell-by-cell walkthrough

Language Support

This documentation covers code snippets in multiple languages (if available) side by side.

  • Python: bindings to the underlying Rust implementation (public kongming-rs-hv on PyPI);
  • Go: canonical / reference implementation in proprietary package;
  • Rust: parallel implementation, carefully maintained in feature parity;

Docs versioning

The documentation on yangzh.github.io/hv is deployed from release tags (v*) and stays in lockstep with the latest kongming-rs-hv release on PyPI. Whatever you read there matches what pip install kongming-rs-hv gives you.

The main branch of this repository is the working head — it may describe APIs or examples that haven’t been released yet. If you browse the raw markdown on GitHub, expect it to occasionally be ahead of the published site.

Reference

The work was initially outlined in this arxiv paper, built on top of the work from many others, and here is the citation:

Yang, Zhonghao (2023). Cognitive modeling and learning with sparse binary hypervectors. arXiv:2310.18316v1 [cs.AI]

Feedback

Found a bug, have a question, or want to suggest an improvement? Open an issue on GitHub.

Last change: , commit: bd2be44

Hypervectors

What is Hyperdimensional Computing?

Hyperdimensional computing (HDC) represents concepts as high-dimensional vectors and manipulates them with simple algebraic operations, typically the dimension (of any vectors) can be as high as thousands.

The key insight is that random vectors in high-dimensional spaces are nearly orthogonal — giving each concept a unique, distributed, and robust representation that tolerates potential ambiguity and interference.

In that sense, the traditional notion of curse of dimensionality becomes the bless of dimensionality.

Motivated readers should perform their own background research on this topic.

Sparse Binary Representation

Kongming uses sparse binary hypervectors. Each vector has a fixed, large number of dimensions (e.g., 65,536/64K or 1,048,576/1M), but only a very small fraction of them are “on” (set to 1). This sparsity is controlled by the Model configuration.

Furthermore, we focus on a special sparse binary configuration: SparseSegmented where each vector is divided into equal-sized segments, and exactly one bit is ON per segment.

Conceptually you can imagine each SparseSegmented hypervector as a list of phasers, where the offset of ON bit (within a host segment) represents the discretized phase.

In general, this unique constraint enables:

  • Compact storage: only the offset of ON bit within its host segment need to be stored
  • Efficient operations: Unlike neural nets, where weights are recorded in float numbers, binary operations can be stored and manipulated very efficiently with modern memory / CPUs.

Identity and Inverses

  • The identity vector has all offsets set to 0. Binding with identity is a no-op. Actually as a special case, there is no storage cost;
  • Binding a vector with its inverse yields the identity.

Similarity and distance measure

Two vectors are compared via overlap — the count of segments where both have the same ON bit. This is equivalent to a bitwise AND operation, which can be performed very efficiently in modern CPU.

For a model with cardinality and segment size , the expected overlap between two random vectors and is:

Given the model setup, this is typically 0, 1 or 2.

Semantically-related vectors have significantly higher overlap. A vector’s overlap with itself equals its cardinality .

The commonly-used distance measure (dis-similar measure) for binary vectors is Hamming Distance, equivalent to a bitwise XOR operation. As we discussed (and proved) in the paper, the overlap and Hamming distance between sparse binary hypervectors are two sides of the same coin, with the following equation:

Supported Models

A Model determines the total number of dimensions (width), how those dimensions are divided into segments (cardinality and sparsity), and therefore implies critical storage and compute characteristics.

ModelWidthSparsity BitsSegment SizeCardinality (ON bits)
MODEL_64K_8BIT65,5368256256
MODEL_1M_10BIT1,048,576101,0241,024
MODEL_16M_12BIT16,777,216124,0964,096
MODEL_256M_14BIT268,435,4561416,38416,384
MODEL_4G_16BIT4,294,967,2961665,53665,536

Model properties

All model functions take a Model enum value and return the derived property:

Note

For simplicity, we use function names from Python. The counterparts from Go / Rust can be found by consulting their respective references.

FunctionDescription
widthTotal dimension count (2^width_bits)
sparsityFraction of ON bits (1 / segment_size)
cardinalityNumber of ON bits (= number of segments)
segment_sizeDimensions per segment

How to Choose a Model

  • MODEL_64K_8BIT: Fast prototyping, tiny memory footprint. Good for tests and small-scale experiments.
  • MODEL_1M_10BIT: General-purpose, balances performance and storage.
  • MODEL_16M_12BIT: General-purpose, for the adventurous.
  • MODEL_256M_14BIT / MODEL_4G_16BIT: Very high capacity, not there yet.

Larger models provide more orthogonal space (lower collision probability) at the cost of more memory per vector.

Note

The storage per hypervector estimation only applies to SparseSegmented (and a few other types) where raw offsets are needed. For certain scenarions, optimization can be employed to dramatically reduce storage requirements. Sparkle, for example, only stores the random seeds so that the offsets can be recovered on-the-fly at serialization time. Composite types (such as Set, Sequence) typically contain references to member Sparkle instances, and typically cost much less storage than a single SparseSegmented instance.

Last change: , commit: 63ad966

Operators

Kongming provides two core algebraic operations on hypervectors.

Bind

Binding () combines two vectors into a result that is dissimilar to both inputs. It is the multiplicative operation in the HDC algebra.

Mathematically

Implementation: segment-wise offset addition modulo segment size: check out original paper for details.

Check out code snippets from the API reference.

Release

Occasionally we use release, which is derived from bind, as the equivalent of division, as opposed to multiplication.

Note that release is anti-commutative:

Check out code snippets from the API reference.

Bundle

Bundling () creates a superposition of vectors — the result is similar to all inputs. It is the additive operation within VSA algebra.

Mathematically

Check out original paper for details on bundle operator.

Check out code snippets from the API reference.

Last change: , commit: 63ad966

Composites

Composites combine multiple hypervectors into higher-level structures. Each composite type uses a different combination strategy, preserving different kinds of relationships between its members.

All composites follows the same contract (interface in Go and traits in Rust) and can be nested — a Set can contain Sparkles, Knots, or even other Sets.

Set

An unordered collection of concepts.

where is a special marker to distinguish a set from its individual members.

This mark is tuned for the domain, so that it will be shared among all sets within the same domain.

Use when: you need to represent “these things together” without order.

Check out code snippets from the API reference.

Sequence

An ordered collection.

where is a generic hypervector for positional encoding.

is a special marker to distinguish a sequence from its individual members. This mark is tuned for the domain, so that it will be shared among all sequences within the same domain.

Use when: order matters (e.g., words in a sentence, events in time).

Check out code snippets from the API reference.

Octopus

A key-value structure. Each key (a string) is converted to a Sparkle and bound with its corresponding value before bundling.

Use when: you need to represent structured records with named attributes.

Check out code snippets from the API reference.

Knot

The result of binding (multiplicative composition) of hypervectors.

Binding is reversible: given a Knot of A and B, you can recover A by releasing B (binding with B’s inverse).

Use when: you need a reversible association between concepts.

Check out code snippets from the API reference.

Parcel

The result of bundling (additive composition).

Unlike direct bundling, Parcel keeps tracking of its members for serialization and introspection.

Use when: you need a superposition of concepts, with optional weights.

Check out code snippets from the API reference.

Pointer

A one-directional reference between two hypervectors.

A Pointer encodes a directed link from a source A to a destination B. Given the pointer and either endpoint, the other endpoint can be recovered: P ⊗ B recovers A, and A ⊗ P^{-1} recovers B.

Pointer is the structured wrapper for the release operation — Release(A, B) returns a Pointer with A as source and B as destination.

Use when: you need a reversible directional link (e.g., representing edges, mappings, or “from→to” relations).

Check out code snippets from the API reference.

Summary

TypeCompositionOrder?Use Case
SetBundle + markerNoUnordered groups
SequencePositional-bind + bundle + markerYesOrdered lists
OctopusKey-bind + bundlePartial (by key)Key-value records
KnotBind (multiply)NoReversible associations
ParcelBundle (add)Nosuperpositions, weighted or unweighted
PointerBind A with Inv(B)DirectionalOne-directional references
Last change: , commit: a781c58

Near Neighbor Search

Near Neighbor Search (NNS) generally retrieves chunks from the storage substrate in the increasing order of Hamming distance (from a query).

As we mentioned earlier, this is equivalent to a strictly decreasing order of overlap (between query and candidate). If overlap encodes the semantic relevance, this translates to a list of semantically similar candidates.

It leverages an underlying Associative Index for efficient recovery of candidates. The Associative Index is a semantic index that enables fast similarity-based lookup over stored hypervectors. Conceptually it turns a key-value substrate (item memory) into an associative memory — one where retrieval is by content similarity, not by exact content or key match.

This NNS module has a constant time complexity, with help from associative index. This implies the query time remain bounded, independent of the number of entries in the storage system. The secret sauce is the efficient random-access to underlying associative index.

Unlike approximate nearest neighbor methods (LSH, HNSW, etc.), the NNS module can computes exact overlap counts via the associative index. There is no approximation error and no index-specific parameters to tune.

Jump to the API reference for Near-Neighbor Search.

Last change: , commit: 63ad966

HV

The core hypervector API. This module provides the building blocks for hyperdimensional computing: vector types, algebraic operators, and model configuration.

SectionDescription
Common UtilitiesModel, SparseOperation, similarity, identity, hashing
OperatorsBind, bundle, and BindDirect
HyperBinary TypesInterface + concrete types (Sparkle, Set, Sequence, etc.)
Customizing Run-time BehaviorEnvironment variables
MiscDisplay, serialization
Last change: , commit: edabb74

Common Utilities

Functions and types used across all HyperBinary types.

SectionDescription
ModelsModel enum and model functions
SparseOperationModel + seeded RNG for deterministic vector generation
Seed128128-bit seed embedding Domain + Pod
Domain & PodSemantic grouping (Domain) and slot identifier (Pod)
UtilitiesSimilarity, identity check, hashing
Last change: , commit: fff9e78

Models

See Concepts: Hypervectors for the full overview.

Model Enum

model0 = hv.MODEL_64K_8BIT

model1 = hv.MODEL_1M_10BIT

Model Functions

hv.width(hv.MODEL_1M_10BIT)           # total dimensions
hv.cardinality(hv.MODEL_1M_10BIT)     # ON bit count
hv.sparsity(hv.MODEL_1M_10BIT)        # sparsity
hv.segment_size(hv.MODEL_1M_10BIT)    # dimensions per segment

See also: SparseOperation — Model + seeded RNG for deterministic vector generation.

Last change: , commit: 34eaa34

Domain & Pod

A Domain models the semantic grouping for hypervectors, providing the high 64-bit half of a Seed128. A Pod is a slot within a Domain, providing the low 64-bit half. The (Domain, Pod) pair uniquely identifies a Sparkle.

Domain Constructors

# From a name string (hashed to a 64-bit id)
d = hv.Domain("animals")

# Same as above
d = hv.Domain.from_name("animals")

# From a raw 64-bit id
d = hv.Domain.from_id(0x1234567890abcdef)

# From a domain prefix enum and a name suffix
# The id is computed as xxhash(prefix_label + "." + name)
d = hv.Domain.from_prefix_and_name(hv.DOMAIN_PREFIX_NLP, "concept")

# Accessors
d.id()              # u64
d.name()            # str (empty if constructed from id)
d.domain_prefix()   # int (0 = UNKNOWN if no prefix was set)
d.is_default()      # True if id == 0

Domain Prefix Constants

ConstantLabel
hv.DOMAIN_PREFIX_USER🎭
hv.DOMAIN_PREFIX_NLP💬

Domain prefixes provide namespacing for domains. When a prefix is set, the domain id is derived from the prefix label (and optional name), ensuring consistent hashing across languages.

Pod Constructors

Pods can be seeded by a string word, a raw uint64, or a prewired enum value.

# From a word string (hashed to a 64-bit seed)
p = hv.Pod("cat")

# Same as above
p = hv.Pod.from_word("cat")

# From a raw 64-bit seed
p = hv.Pod.from_seed(42)

# From a prewired enum value
p = hv.Pod.from_prewired(hv.PREWIRED_SET_MARKER)
p = hv.Pod.from_prewired(hv.PREWIRED_STEP)

# Accessors
p.seed()       # u64
p.word()       # str (empty if constructed from seed or prewired)
p.prewired()   # int (0 if not prewired)
p.is_default() # True if seed == 0

Prewired Constants

Prewired pods are infrastructure-level constants with fixed seeds:

ConstantLabel
hv.PREWIRED_NIL
hv.PREWIRED_FALSE
hv.PREWIRED_TRUE
hv.PREWIRED_BEGIN🚀
hv.PREWIRED_END🏁
hv.PREWIRED_LEFT⬅️
hv.PREWIRED_RIGHT➡️
hv.PREWIRED_UP⬆️
hv.PREWIRED_DOWN⬇️
hv.PREWIRED_MIDDLE⏺️
hv.PREWIRED_STEP𓊍
hv.PREWIRED_SET_MARKER🫧
hv.PREWIRED_SEQUENCE_MARKER📿

Polymorphic arguments (Python-only)

Most Python factories that take a Domain or Pod accept the underlying primitives directly — you rarely need to wrap them explicitly:

Parameter typeAccepted Python forms
DomainDomain instance, str, int, (DomainPrefix, str) tuple
PodPod instance, Prewired enum, str, int
# Domain — four equivalent forms in any factory expecting a Domain:
memory.by_item_key("animals", "cat")
memory.by_item_key(hv.Domain.from_name("animals"), "cat")
memory.by_item_key(0x1234, "cat")                           # from numeric id
memory.by_item_key((hv.DOMAIN_PREFIX_NLP, "concept"), "p")  # from (prefix, name)

# Pod — Prewired enum is recognized:
memory.new_terminal("internal", hv.PREWIRED_STEP)           # Pod from Prewired
memory.new_terminal("animals", "cat")                       # Pod from word
memory.new_terminal("animals", 0xCAFE_BABE)                 # Pod from raw seed

For the parallel polymorphism on Seed128, see Seed128 → Polymorphic arguments.

Last change: , commit: 616651d

Seed128

A Seed128 is a 128-bit seed to drive a random number generator.

The current random number generator expects 2 64-bit seeds: the same (seed_high, seed_low) pair always produces the same sequence of random numbers, enabling reproducible and deterministic vector generation across runs and languages.

Constructors

# From Domain and Pod arguments (each accepts Domain/Pod, int, or str)
seed = hv.Seed128("animals", "cat")                # domain name + pod word
seed = hv.Seed128(0, 42)                           # default domain + raw pod seed
seed = hv.Seed128("animals", 42)                   # domain name + raw pod seed
seed = hv.Seed128(hv.Domain("animals"), hv.Pod("cat"))  # explicit Domain/Pod objects

# Zero seed
seed_zero = hv.Seed128.zero()                      # (0, 0)

# Random seed from a SparseOperation
seed_rand = hv.Seed128.random(so)                  # consumes two u64 from the RNG

# Accessors
seed.domain()                                      # Domain object
seed.pod()                                         # Pod object
seed.high()                                        # u64 (domain id)
seed.low()                                         # u64 (pod seed)

Usage

All composite constructors take a Seed128, as seed for the bundle operator:

seed = hv.Seed128("fruits", "fruit_set")

s = hv.Set(seed, a, b, c)
seq = hv.Sequence(seed, a, b, c)

Polymorphic arguments (Python-only)

Anywhere a Python factory expects a Seed128 (composite constructors like hv.Set / hv.Sequence / hv.Octopus, the hv.bundle operator, etc.) you can pass either a Seed128 instance or a (domain, pod) tuple — the binding extracts and constructs the seed for you.

Parameter typeAccepted Python forms
Seed128Seed128 instance, or a (domain, pod) tuple

The tuple composes with the polymorphic forms accepted by Domain and Pod (see Domain & Pod → Polymorphic arguments), so each side can itself be a string / int / Prewired enum / (prefix, name) tuple — letting you skip the hv.Seed128(...) wrap entirely:

# Equivalent to hv.Sequence(hv.Seed128("words", "hi"), m1, m2):
seq = hv.Sequence(("words", "hi"), m1, m2)

# Tuple form composes with Domain's (DomainPrefix, str) tuple:
seq = hv.Sequence(((hv.DOMAIN_PREFIX_NLP, "concept"), "myseq"), m1, m2)

# And with Pod's Prewired enum:
seq = hv.Sequence(("internal", hv.PREWIRED_STEP), m1, m2)
Last change: , commit: 616651d

SparseOperation

A SparseOperation instance wraps a Model, a random number generator, and potentially other information related to the sparse operation in general.

Constructor

so = hv.SparseOperation(hv.MODEL_1M_10BIT, 0, 42)
so1 = hv.SparseOperation(hv.MODEL_1M_10BIT, "domain", "pod")

Methods

so.model()        # Model enum

so.width()        # width for this model

so.cardinality()  # cardinality for this model

so.sparsity()     # sparsity for this model

so.uint64()       # next random number

Usage: Generating Random Vectors

so = hv.SparseOperation(hv.MODEL_1M_10BIT, 0, 42)
sparkle = hv.Sparkle.random(hv.Domain("domain"), so)
Last change: , commit: 34eaa34

Utilities

Similarity

hv.overlap(a, b)    # Overlap

hv.hamming(a, b)    # Hamming distance

hv.equal(a, b)      # Equality check

Identity Check

v=hv.Sparkle.identity(model)

hv.is_identity(v)   # True if v is an identity vector

Hash Utilities

hv.hash64_from_string("hello")   # deterministic u64 hash from string
hv.hash64_from_bytes(b"\x01\x02") # deterministic u64 hash from bytes
hv.curr_time_as_seed()            # current time as a u64 seed
hv.kongming_studio_seed()         # fixed studio seed constant
Last change: , commit: 34eaa34

HyperBinary Types

All vector types conform to a common interface. In Go this is the HyperBinary interface; in Rust it is the HyperBinary trait. The two implementations are kept at feature parity.

Python doesn’t have the concept of interface/trait, but all HyperBinary derived types share a common set of methods.

v.model()        # Model enum
v.width()
v.cardinality()
v.hint()
v.stable_hash()  # int
v.seed128()
v.exponent()

v.core()         # SparseSegmented
v.power(p)       # HyperBinary

Concrete Types

TypeDescription
SparseSegmented 🍡Foundational vector — packed per-segment offsets
Sparkle ✨Seeded, deterministic hypervector
Learner 💫Online Hebbian learning
Set 🫧Unordered collection
Sequence 📿Ordered collection with positional encoding
Octopus 🐙Key-value composite
Knot 🪢Bound (multiplied) group
Parcel 🎁Bundled (added) group
Last change: , commit: 03128b0

SparseSegmented 🍡

The most foundational vector type — a sparse binary hypervector where each segment has exactly one ON bit at the recorded offset location. All other types (Sparkle, Set, Sequence, etc.) ultimately contain a SparseSegmented in memory for processing.

Structure

FieldDescription
modelSparsity configuration (Model)
offsetsPacked bit array of per-segment ON offsets. nil/None = identity vector
hashLazy-computed stable hash for equality checks

The offsets are bit-packed according to the model’s sparsity bits — they do not align to byte boundaries. This trades a small CPU cost for compact, uniform storage that works both in memory and on disk.

Identity vector: when offsets is blank, the vector is the identity vector where all offsets are 0. Binding with identity is a no-op, and identity requires zero storage for offsets.

Constructors

# Identity
ss = hv.SparseSegmented.identity(model)

# From per-segment offsets, typically discouraged...
ss = hv.SparseSegmented.from_offsets(model, [off0, off1, ...])

Key Methods

ss.is_identity()  # True if identity vector

ss2 = ss.power(2)
inv = ss.power(-1)

# Similarity
hv.overlap(a, b)   # Count of matching ON bits
hv.hamming(a, b)   # Count of differing segments

ss.offsets()   # returns all offsets

Serialization

SparseSegmented serializes to HyperBinaryProto with hint SPARSE_SEGMENTED. The offsets field carries the raw packed bytes. Identity vectors serialize with empty offsets.

Last change: , commit: 7e54b1c

Sparkle ✨

Sparkles are the atomic building block for higher-level constructs: essentially SparseSegmented annotated with domain and pod.

Domain is a logical namespace that groups related Sparkle instances. Pod acts as the secondary identifier for a Sparkle instance.

Sparkle is deterministic: the same (domain, pod) pair always produces the same offsets. For this reason, the (model, pod) pair uniquely identifies a Sparkle.

Sparkle Constructors

# From a word string
s0 = hv.Sparkle.from_word(model, "animals", "cat")

# From a numeric seed
s1 = hv.Sparkle.from_seed(model, "animals", 42)

# From a prewired enum
s2 = hv.Sparkle.from_prewired(model, "animals", hv.PREWIRED_SET_MARKER)

# Identity vector
s3 = hv.Sparkle.identity(model)

# Random (from SparseOperation)
so=hv.SparseOperation(hv.MODEL_1M_10BIT, "domain", "pod")
s4 = hv.Sparkle.random("animals", so)

Key Methods

s0.model()         # Model enum
s0.stable_hash()   # Deterministic hash
s0.exponent()      # Current exponent (1 for base vector)

s0_square=s0.power(2)     # Returns p-th power (new Sparkle)
hv.equal(s0, s0_square)   # s0_square = s0^2, different from original s0.
       
core0=s0.core()     # Returns underlying SparseSegmented
core0.offsets()    # The raw offsets for each segment.
Note

power(0) always returns the identity sparkle. power(-1) returns the inverse.

Pretty-printing

# Pretty-printing, or s.__str__()
print(s0)
# ✨:🔗animals,🌱cat

# More detailed information, or s.__repr__()
s
# hint: SPARKLE
# model: MODEL_1M_10BIT
# stable_hash: 9725717137035622833
# domain:
#   name: animals
# pod:
#   word: cat

During pretty-printing of Sparkle instances, you may notice special emoji for domain / pods.

emojis for domain / pod
EmojiVariantExample
🔗named domain🔗animals, 🔗PREFIX.name
🌐numeric domain🌐0x..c862
🌱named pod🌱cat
🫛numeric pod🫛0x..80e4
🍀pre-defined pod🍀SET_MARKER
💪Exponent / Power💪3, 💪-1

Identity vectors display as IDENT (e.g., ✨IDENT).

Note

The underlying offsets are lazily generated from a seeded PRNG. Only the seeds are stored in serialization, which is a significant storage saving; offsets are recomputed during de-serialization.

Last change: , commit: 2427493

Learner 💫

Learners are designed to perform online bundling for a stream of observations, in the form of Hebbian-style learning.

The total storage / processing budget is fixed — what matters is the distribution of weights among observed vectors.

Constructors

learner = hv.Learner(model, hv.Seed128(0, 42))

# a randomly-initialized learner.
learner = hv.Learner.random(so)

Feeding Observations

learner.bundle(a)                 # single observation

learner.bundle_multiple(b, 3)     # with weight multiplier

Inspection

learner.age()             # number of observations seen

learner.affinity(a)   # raw overlap; returns RandomOverlap when age==0
learner.weight(a)     # implicit weight for a probe vector; 0.0 when age==0
Untrained learner is neutral

A fresh Learner (age == 0) has no offsets to overlap against. Affinity short-circuits to a non-zero baseline so that Weight yields exactly 0.0 for any probe — neutral, non-selecting, also non-rejecting.

Last change: , commit: 767063e

Set 🫧

An unordered collection of hypervectors. See Composites: Set for the conceptual overview.

Constructor

s = hv.Set(hv.Seed128(0, 42), first, second, third)

Notable methods


# All these will be approximately 1/3 of the total cardinality.
hv.overlap(s.unmasked(), first)
hv.overlap(s.unmasked(), second)
hv.overlap(s.unmasked(), third)
Last change: , commit: 63ad966

Sequence 📿

An ordered collection of hypervectors with positional encoding. See Composites: Sequence for the conceptual overview.

Constructor

# Constructing a sequence, with logical index start at 1 (default to 0).
seq = hv.Sequence(hv.Seed128(0, 42), first, second, third, start=1)

In-place edits: Append / Prepend / Reset

Append, Prepend, and Reset all mutate the Sequence in place — clone first if you need to preserve the original.

  • Append(more...) — add members at the end. start is unchanged.
  • Prepend(more...) — add members at the front; start decrements by len(more) so existing members keep their positional binding.
  • Reset(start) — shift the starting index. No-op when start equals the current start.

After any of these, seq equals what you’d get by building a fresh NewSequence(seed, new_start, all_members...) — the domain/pod seed is preserved.

import copy

seq = hv.Sequence(hv.Seed128(0, 42), a, b, c)

# Append / Prepend are variadic and mutate in place.
seq.append(d, e)            # seq now [a, b, c, d, e]
seq.prepend(x, y)           # seq now [x, y, a, b, c, d, e], start -= 2
seq.reset(10)               # shift the starting index to 10

# To preserve the original, clone first:
base = hv.Sequence(hv.Seed128(0, 42), a, b, c)
s1 = copy.copy(base)
s1.append(d)                # base is untouched
Last change: , commit: 0642da6

Octopus 🐙

A key-value composite where each value is bound with its key’s Sparkle. See Composites: Octopus for the conceptual overview.

Constructor

Keys are Pods. In Python, strings (and any value polymorphically convertible to Pod) are accepted and auto-converted.

oct = hv.Octopus(hv.Seed128(0, 42), ["color", "shape"], red, circle)

Key Methods

oct.value_by_key("color")  # accepts Pod | str | int | Prewired
Last change: , commit: 616651d

Knot 🪢

The result of binding (multiplicative composition) of hypervectors. Unlike BindDirect, Knot tracks its member parts for serialization and debugging. See Composites: Knot.

Constructor

# Not directly constructed in Python. Use hv.bind() instead.
k = hv.bind(a, b)

Extending a Knot

An existing Knot can be extended with additional parts via expand. This mutates the Knot in place — equivalent to re-binding all parts from scratch but without reconstructing the base.

k = hv.bind(a, b)
k.expand(c)       # k is now equivalent to hv.bind(a, b, c)

If you need to preserve the original, clone before expanding — see the Expand operator section for full examples including clone-first patterns.

Last change: , commit: 0c01fa5

Parcel 🎁

The result of bundling (additive composition) of hypervectors. Unlike BundleDirect, Parcel tracks its members and bundling seed for serialization and debugging. See Composites: Parcel.

Constructors

p = hv.bundle(hv.Seed128(10, 1), a, b, c)
Last change: , commit: 49d0013

Pointer 👉

A one-directional reference between two hypervectors. A Pointer encodes a directed link from a source to a destination via P = source ⊗ Inv(destination). Given the pointer and either endpoint, the other endpoint can be recovered. See Composites: Pointer.

Constructor

p = hv.Pointer(hv.Seed128(0, 42), source, destination)

# Or via the release operator:
p = hv.release(source, destination)

Endpoints

A Pointer retains references to its source (A) and destination (B).

p.source()        # → A
p.destination()   # → B

Recovering endpoints

Given the pointer and one endpoint, the other can be recovered:

  • RDeref(B) = A — recover the source given the destination, via P ⊗ B.
  • Deref(A) = B — recover the destination given the source, via A ⊗ Inv(P).
p = hv.Pointer(seed, a, b)
recovered_a = p.rderef(b)   # ≈ a
recovered_b = p.deref(a)    # ≈ b

Anti-commutativity

Pointer (and the release operator that constructs it) is anti-commutative:

Last change: , commit: a781c58

Operators

See Concepts: Operators for the full overview.

Bind

bound = hv.bind(a, b)
released = hv.release(bound, b)  # this will recover `a`

hv.equal(a, b)                   # hash equality

Release

Extracts one component from a binding:

release returns a Pointer — a directional reference from composite to role that retains both endpoints for inspection and serialization. The bit-level value is identical to bind(composite, inverse(role)).

bound = hv.bind(role, filler)
recovered = hv.release(bound, role)  # Pointer; ≈ filler at the bit level

Expand (extend a Knot)

Extends an existing Knot with additional operands without re-binding from scratch. k.expand(c) on k = bind(a, b) gives the same result as bind(a, b, c) — but mutates k in place, so clone first if you need the original.

import copy

k = hv.bind(a, b)
k.expand(c)                 # k is now equivalent to hv.bind(a, b, c)

# To preserve the original, clone first:
base = hv.bind(a, b)
k1 = copy.copy(base)
k1.expand(c)                # base is untouched

BindDirect

Like Bind, but returns a raw SparseSegmented instead of a Knot — no operand tracking. Cheaper for intermediate computations where you don’t need to reverse the bind or inspect the operand list.

# domain/pod default to the zero Domain/Pod
ss = hv.bind_direct(a, b, c)

# Or supply an explicit seed (annotates the resulting SparseSegmented):
ss = hv.bind_direct(a, b, domain=d, pod=p)

Bundle

p = hv.bundle(hv.Seed128(10, 1), a, b, c)
Last change: , commit: a781c58

Customizing runtime behavior

Environment Variables

All environment variables are read once on first access and cannot be changed at runtime. Unset variables use the documented default.

KONGMING_RNG

Selects the pseudo-random number generator backend used for hypervector generation.

ValueDescription
xoshiro++ (default)xoshiro256++ — simple, fast, cross-language deterministic
pcgPCG-DXSM — classic/compat mode (matches pre-v3.7.5 behavior)

Changing this affects all generated vectors: Sparkle offsets, Learner bundling, Cyclone patterns. Vectors generated with different backends are not comparable.

KONGMING_REPR_FORMAT

Controls __repr__() / Repr() output format.

ValueDescription
YAML (default)Multi-line YAML dump
PROTOMulti-line protobuf debug string

KONGMING_LEARNER_SAMPLING

Controls the bundling strategy used by Learner.

ValueDescription
FISHER_YATES (default)Fisher-Yates shuffle — selects exactly the right number of segments per round
CLASSICPer-segment probabilistic sampling — each segment is independently sampled with a fixed probability
# Example: use PCG for backward compatibility with pre-v3.7.5 vectors
export KONGMING_RNG=pcg

# Example: switch repr to protobuf debug format
export KONGMING_REPR_FORMAT=PROTO

# Example: use classic sampling in Learner
export KONGMING_LEARNER_SAMPLING=CLASSIC

Querying the Current Environment

Use global_env() to inspect all active settings at runtime. Returns a GlobalEnv protobuf message — new fields added to the proto automatically appear.

>>> hv.global_env()
rng_hint: XOSHIRO256PP
learner_sampling: FISHER_YATES
repr_format: YAML
Last change: , commit: 17a1885

Misc

Display

All HyperBinary types have a compact, emoji-prefixed string representation for quick visual inspection. See HyperBinary Types for type symbols and Sparkle for field labels.

Python __str__ and __repr__

__str__ (triggered by print()) returns the compact emoji form:

>>> a = hv.Sparkle.with_word(hv.MODEL_64K_8BIT, hv.d0(), "hello")
>>> print(a)
✨:🌐0x..c862,🫛0x..80e4

__repr__ (triggered by evaluating a variable in the shell or notebook) returns a detailed, developer-friendly YAML representation, controlled by the KONGMING_REPR_FORMAT environment variable:

>>> a
hint: SPARKLE
model: MODEL_64K_8BIT
stable_hash: 12345678
domain:
  id: ...
pod:
  seed: 12345

Set KONGMING_REPR_FORMAT=PROTO for protobuf debug output instead of the default YAML. See Environment Variables for all supported variables.

Go / Rust Display

print(sparkle)      # compact emoji form via __str__
repr(sparkle)       # detailed YAML/proto form via __repr__

Serialization

# HyperBinary → protobuf bytes
msg = hv.to_message(sparkle)

# protobuf bytes → HyperBinary
obj = hv.from_message(msg)

# raw proto bytes → HyperBinary
obj = hv.from_proto_bytes(data)

# proto bytes → YAML string (for debugging)
hv.format_to_yaml(data)
Last change: , commit: 03bd04a

Memory

The memory package provides persistent and in-memory storage for hypervectors with semantic indexing and near-neighbor search.

The core abstraction is a Chunk — an immutable identity (Sparkle) paired with a mutable semantic code (any HyperBinary). Chunks are stored in a Substrate (pluggable storage backend), queried via ChunkSelectors, and created via ChunkProducers.

SectionDescription
ChunkThe fundamental storage unit
Substrate & ViewsStorage backends and transactional views
SelectorsQuery builders for reading chunks
ProducersWrite builders for creating chunks
Last change: , commit: 4532a28

Chunk

The fundamental storage unit in the memory system. A Chunk carries a semantic code (any HyperBinary type) along with its derived identity (a Sparkle as implied from the code’s domain/pod).

The identity determines the storage key and drives compositionality — a chunk is either present or absent. The code is potentially learnable, offering opportunities to adapt over time, just like weights from traditional neural nets.

Structure

FieldTypeDescription
codeHyperBinarySemantic content (can be updated). Required — its domain/pod determines the chunk’s identity.
idSparkleidentity vector, as derived from code’s domain/pod; determines the storage key.
notestringHuman-readable annotation, primarily for debugging
extraprotobuf AnyExtensible payload for application-specific data, primarily for debugging

Inspection

Chunks are typically created via producers (see Producers), but can be inspected after retrieval (see Selectors).

# chunk = memory.first_picked_chunk(view, memory.by_item_key("animals", "cat"))

chunk.id        # Sparkle
chunk.code      # HyperBinary
chunk.note      # str
chunk.extra     # Optional[bytes]
Last change: , commit: 7e54b1c

Substrate & Views

A Substrate is a pluggable storage backend. It provides transactional views for reading and writing chunks.

View Pattern

All storage access goes through views:

  • SubstrateView — read-only, supports key lookup and prefix scanning
  • SubstrateMutableView — extends SubstrateView with write staging and atomic commit (to underlying storage)
# Read-only view (context manager)
with storage.new_view() as view:
    # Check if chunk exists, without actually reading it back.
    exists = view.chunk_exists("animals", "cat")

    cat_chunk = view.read_chunk("animals", "cat")

# Mutable view (auto-commits on clean exit, rollback on exception).
# Stage writes by running producers against the view via
# producer.produce(view) — the recommended path for batched writes.
with storage.new_mutable_view() as view:
    memory.new_terminal("words", "hi").produce(view)
    memory.from_sequence_members("words", "greet", members,
                                  semantic_indexing=True).produce(view)
    # commits automatically

Storage Backends

InMemory

Volatile, in-process storage. All data lost on exit. Best for testing and ephemeral caches.

storage = memory.InMemory(hv.MODEL_64K_8BIT, "my_store")

Embedded

Persistent, single-machine storage backed by an embedded key-value store. Suitable for local development and moderate-scale deployments.

storage = memory.Embedded(hv.MODEL_64K_8BIT, "/path/to/store")

ScyllaDB (Distributed)

Distributed storage via Cassandra-compatible ScyllaDB. For high-scale, multi-node deployments.

# Not exposed yet...
Last change: , commit: 13d36c4

Selectors

ChunkSelectors are composable query builders for reading chunks from the substrate. Each selector defines how to locate and return matching chunks.

Last change: , commit: 63ad966

NNS (Near-Neighbor Search)

Wraps a single attractor to perform near-neighbor search. For multiple attractors, compose them with joiner(...) first.

result = memory.first_picked(
    view, memory.nns(
        memory.set_members(memory.by_item_key("sets", "my_set"))))
Last change: , commit: 09dcdbc

Each attractor conceptually provides “the center of attraction” for candidates: the NNS accepts one or more attractors, to perform the actual near-neighbor search work, by interacting with underlying associative index.

Forward attractors

Roughly forward attractors try to find parts from a given a composite.

AttractorModifierAttracts
SetMembersAttractordepends on selected.code.domainAll members of the Set
SequenceMemberAttractordepends on selected.code.domainSequence member at a specific position
TentacleAttractor(octopus, key)Inverse(Sparkle(model, "", key))Octopus value for that key
memory.set_members(memory.by_item_key("sets", "my_set"))

memory.sequence_member(memory.by_item_key("seqs", "my_seq"), pos=2)

memory.tentacle(memory.by_item_key("records", "person"), "name")

Reverse Attractors

Roughly reverse attractors try to locate composites given a part.

AttractorModifierAttracts
SetAttractor(member, candidate)Sparkle(SET_MARKER @ candidate)All Sets in candidate containing member
SequenceAttractor(member, pos, candidate)Bind(SEQ_MARKER @ candidate, Step^pos)All Sequences in candidate with member at pos
OctopusAttractor(key, value)Sparkle(model, "", key)Octopuses with that key/value pair
memory.set_attractor(memory.by_item_key("animals", "cat"), "sets")

memory.sequence_attractor(memory.by_item_key("animals", "cat"), 0, "seqs")

memory.octopus_attractor("color", memory.by_item_key("colors", "red"))

Analogical Reasoning

AnalogicalReasoner(dst, src, feature) performs analogical reasoning (“A is to B as C is to ?”): for each chunk c yielded by dst, it computes Bind(c.code, feature, Inverse(src)) and forwards to NNS. Model is implicit in src / feature.

Given the analogy “king is to queen as man is to ?”:

king   = hv.Sparkle(model, "role", "king")
queen  = hv.Sparkle(model, "role", "queen")
man    = hv.Sparkle(model, "role", "man")

# Analogy: "king is to queen as man is to ?"
#   src     = king   (the known source of the relationship)
#   feature = queen  (the known feature/attribute of src)
#   dst     = man    (the target; we want to find its corresponding feature)
#
# src = king (known source), feature = queen (known relation), dst = man.
# Modifier = queen ⊗ inverse(king); applied to man → "woman".
memory.nns(
    memory.analogical_reasoner(memory.with_code(man), king, queen))

Direct WithCodeModifier / WithIDModifier

For ad-hoc patterns that don’t fit a named attractor, use the primitives directly. They take a precomputed HyperBinary modifier and apply Bind(code, modifier) or Bind(id, modifier) to each yielded chunk:

memory.with_code_modifier(inner_selector, modifier_vec)
memory.with_id_modifier(inner_selector, modifier_vec)
Last change: , commit: 997a21c

Other Selectors

ByItemKey

Exact lookup by domain + pod.

sel = memory.by_item_key("animals", "cat")

ByItemDomain

All chunks in a given domain (prefix scan).

sel = memory.by_item_domain("animals")

WithCode / WithSparkle

Literal selector — returns a hypervector directly, no storage lookup.

sel = memory.with_code(some_hv)

sel = memory.with_sparkle("animals", "cat")

Joiner

Union of multiple selectors — returns results from each of the inner selectors.

sel = memory.joiner(
    memory.by_item_key("animals", "cat"),
    memory.by_item_key("animals", "dog"),
)

Range

Limits results to [start, start+limit). limit=0 (default) implies no limit, and iteration continue until there is no more results.

sel = memory.range_sel(
    memory.by_item_domain("animals"), start=0, limit=10)

OnlyDomain

Filters inner selector results by given domain.

sel = memory.only_domain(
    "animals", inner_selector)
Last change: , commit: 63ad966

Working with Results

FirstPicked — get the first match

Returns the first chunk matching the selector. Returns an error if nothing is found.

# Returns the first matching Chunk (with .id, .code, .note, .extra)
chunk = memory.first_picked(view, selector)
print(chunk.id, chunk.code, chunk.note)

mem_get — eager batch read (Chunks only)

Returns every match as a list[Chunk]. No extras — any per-result SelectorExtra produced by the selector (e.g. NNS scores) is discarded. Use this when you only need the Chunks.

chunks = storage.mem_get(selector)        # list[Chunk]
for chunk in chunks:
    print(chunk.id, chunk.note)

lazy_selector_iter — stream Chunks with extras

Yields (Chunk, Optional[SelectorExtra]) tuples one at a time. This is the only way to access per-result SelectorExtra in Python; mem_get drops it.

# Streaming — useful for large result sets or early termination
for chunk, extra in memory.lazy_selector_iter(view, selector):
    print(chunk.id, extra)
    if done():
        break

# Eager with extras — wrap in list()
results = list(memory.lazy_selector_iter(view, selector))
# results: list[tuple[Chunk, Optional[SelectorExtra]]]
Last change: , commit: 6b4c763

Producers

ChunkProducers are write builders that create and persist chunks in the substrate. Each producer encapsulates the logic for constructing a specific type of chunk.

Note
Some producers only update existing chunks (e.g., ClusterUpdater) without creating new ones. In those cases, Produce returns the updated chunk rather than a newly created one.

Producer Options

Producer options are additional information supplied to producer constructor to tweak behavior.

# `note` indiciates additional note for the new terminal chunk.
memory.new_terminal("d", "p", note="annotation")

# `semantic_indexing` indicates we need to index the semantic code 
# (on top of the id vector).
memory.from_set_members("d", "p", members, semantic_indexing=True)

Concrete Producers

NewTerminal

Creates a chunk whose code equals its identity (a bare Sparkle). Useful for registering atoms/symbols.

with storage.new_mutable_view() as view:
    memory.mem_set(view, memory.new_terminal("fruits", "apple", note="an apple"))

NewLearner

Creates a fresh Learner chunk for online learning.

with storage.new_mutable_view() as view:
    memory.mem_set(view, memory.new_learner("learners", "my_learner", note="a learner"))

FromSetMembers

Creates a Set from stored members.

with storage.new_mutable_view() as view:
    memory.mem_set(
        view, 
        memory.from_set_members(
            "sets",
            "fruit_set", 
            memory.by_item_domain("fruits"),
        ))

FromSequenceMembers

Creates a Sequence from stored members with positional encoding.

with storage.new_mutable_view() as view:
    memory.mem_set(
        view, 
        memory.from_sequence_members(
            "seqs",
            "greeting", 
            memory.joiner(
                memory.by_item_key("words", "hello"),
                memory.by_item_key("words", "world"),
            ),
            start=0),
    )

FromKeyValues

Creates an Octopus (key-value composite) from keys and value selectors.

with storage.new_mutable_view() as view:
    memory.mem_set(
        view, 
        memory.from_key_values(
            "records",
            "obj1", 
            keys=["color", "shape"], 
            values=memory.joiner(
                memory.by_item_key("colors", "red"),
                memory.by_item_key("shapes", "circle"),
            ),
        ))

FromSourceDest

Creates a Pointer 👉 chunk — a directional reference from a source chunk to a dest chunk. Both selectors must resolve to a single chunk; the produced Pointer’s bit-level value is source.id ⊗ Inv(dest.id).

with storage.new_mutable_view() as view:
    memory.mem_set(
        view,
        memory.from_source_dest(
            "edges", "earth_to_moon",
            memory.by_item_key("planets", "earth"),
            memory.by_item_key("planets", "moon"),
        ))

ClusterUpdater

Feeds an observed chunk into an existing Learner, updating its accumulated code via bundling. The bundle multiplier defaults to 1; pass an explicit override (multiple=N in Python) to fold the same observation in repeatedly.

with storage.new_mutable_view() as view:
    # With explicit multiplier:
    memory.mem_set(view,
        memory.cluster_updater(
            learner=memory.by_item_key("learners", "my_learner"),
            observed=memory.by_item_key("fruits", "apple"),
            multiple=3,
        ))
Last change: , commit: 2e9e0e7

Train 🚂🚃🚃🚃

A doubly-linked, payload-carrying chunk data structure built on Octopus chunks. The train is a train of carriages — each carriage 🚃 carries one payload chunk, and carriages couple to their neighbors at LEFT ⬅️ / RIGHT ➡️.

Shape

A train occupies its own 64-bit train_domain. The structure has two kinds of chunks:

  • Sentinel locomotive 🚂 — the chunk at (train_domain, P0). Anchors both ends of the train. Its LEFT slot points at the leftmost carriage, its RIGHT slot at the rightmost. Its PAYLOAD slot is identity (no payload).
  • Carriage 🚃 — a chunk at (train_domain, carriage_pod) for some non-zero pod. Each carriage is a specialized Octopus with four slots:
    • PAYLOAD 📨 — points at the actual payload chunk in item memory (Sparkle(payload_domain, payload_pod)). Always non-identity.
    • LEFT ⬅️ — id-Sparkle of the previous carriage, or identity if leftmost.
    • RIGHT ➡️ — id-Sparkle of the next carriage, or identity if rightmost.
    • CHILDREN 👨‍👩‍👧‍👦 — optional pointer at a child structure (typically the sentinel of a child train). Identity when no children are attached.

A 3-carriage train (payloads A, B, C in left-to-right order):

       🚂                🚃                🚃                🚃
   ┌──────────┐      ┌──────────┐      ┌──────────┐      ┌──────────┐
   │ sentinel │ <--> │    A     │ <--> │    B     │ <--> │    C     │
   │  pod=P0  │      │ pod=p_A  │      │ pod=p_B  │      │ pod=p_C  │
   └──────────┘      └──────────┘      └──────────┘      └──────────┘

Each box is one Octopus chunk. Each <--> between adjacent boxes represents one bidirectional LEFT/RIGHT pointer pair. The slot values inside each chunk are:

ChunkPAYLOADLEFTRIGHTCHILDREN
sentinel 🚂 (pod=P0)identity ⊥leftmost (A)rightmost (C)identity ⊥ (or a child structure)
carriage ASparkle(A*)identity ⊥carriage Bidentity ⊥ (or a child structure)
carriage BSparkle(B*)carriage Acarriage Cidentity ⊥ (or a child structure)
carriage CSparkle(C*)carriage Bidentity ⊥identity ⊥ (or a child structure)

Where A*, B*, C* are the actual payload chunks living elsewhere in item memory.

The sentinel’s LEFT/RIGHT are anchor pointers (jumps directly to the leftmost / rightmost), not neighbor pointers. The carriages’ LEFT/RIGHT are conventional neighbor pointers.

Pushing

storage.mem_set(
    memory.train_appender(
        train_domain, memory.with_sparkle(payload_domain, "A")))

storage.mem_set(
    memory.train_prepender(
        train_domain, memory.with_sparkle(payload_domain, "Z")))

Each push rewrites up to three chunks atomically through one mutable view:

  1. The new carriage at (train_domain, fresh_pod).
  2. The previous tail/head carriage, with its tail-side neighbor link updated to the new carriage. (Skipped on the first push, when the train is empty.)
  3. The sentinel, with its tail-side anchor pointer updated to the new carriage. The head-side pointer is updated only when the train transitions from empty to one element.

The sentinel is auto-created on the first push — no separate “init train” step is needed.

⚠️ Single-writer per train. Concurrent pushes from independent mutable views on the same train WILL race on the sentinel and on the previous tail/head — last-write wins, with dropped pushes possible. Callers expecting concurrent appenders must synchronize externally (e.g., serialize through a single goroutine/task, or take an external lock keyed on train_domain). Reads through a View are always snapshot-consistent per chunk.

Mid-train insertion: TrainInsertAfter / TrainInsertBefore

TrainAppender and TrainPrepender are thin wrappers over the more general TrainInsertAfter / TrainInsertBefore, which take a reference carriage selector instead of a domain. The new carriage is wedged on the chosen side of the reference. The train domain is taken from the resolved reference carriage.

Reference carriageTrainInsertAfterTrainInsertBefore
sentinel 🚂 (TrainCarriage(domain))append (new rightmost)prepend (new leftmost)
member carriage Xwedge between X and X.rightwedge between X.left and X

Sentinel-as-reference is the wraparound bookend case: since the sentinel sits on both ends of the train, “after the sentinel” means “right of the rightmost,” and “before the sentinel” means “left of the leftmost.” On an empty train, either becomes the first and only member.

Each insert rewrites:

  1. The new carriage.
  2. The carriage on each side whose neighbor link points at the new one (one rewrite for end-inserts, two for mid-train inserts).
  3. The sentinel, only when its anchor pointer changes (i.e., the new carriage becomes the leftmost or rightmost).
# Mid-train: wedge M between B and C.
storage.mem_set(memory.train_insert_after(
    memory.train_carriage(train_domain, b_pod),
    memory.with_sparkle(payload_domain, "M"),
))

# Sentinel-as-ref: equivalent to train_appender.
storage.mem_set(memory.train_insert_after(
    memory.train_carriage(train_domain),
    memory.with_sparkle(payload_domain, "tail"),
))

Empty check: TrainIsEmpty

A direct query that reads the sentinel and reports whether both LEFT and RIGHT anchors are identity. A train whose sentinel hasn’t been written yet (never touched) is also reported empty — no auto-init.

with storage.new_view() as view:
    if memory.train_is_empty(view, train_domain):
        print("nothing to process")

Iterating — cursor-straddle semantics

TrainForward / TrainBackward walk the train, yielding each carriage’s payload chunk. The starting point is itself a Selector — pass TrainCarriage(domain) (with no pod) to start from the sentinel for a full-train iteration, or TrainCarriage(domain, pod) to start from a specific carriage.

The cursor straddles between elements, similar to Java’s ListIterator or C++’s std::list::iterator:

  • start is the cursor; iteration is exclusive of start’s own payload.
  • When start resolves to the sentinel, iteration covers the whole train (sentinel acts as both before-leftmost and after-rightmost).
  • When start resolves to a member carriage, iteration yields the elements strictly after it in the chosen direction; the start carriage’s own payload is NOT yielded.
startTrainForward yieldsTrainBackward yields
sentinelleftmost, …, rightmostrightmost, …, leftmost
carriage XX.right, X.right.right, …X.left, X.left.left, …
# Whole-train iteration from the sentinel.
for chunk in storage.mem_get(
    memory.train_forward(memory.train_carriage(train_domain))):
    print(chunk.note)

# From a specific carriage onward (exclusive of that carriage).
for chunk in storage.mem_get(
    memory.train_backward(memory.train_carriage(train_domain, carriage_pod))):
    print(chunk.note)

Single-step shorthands: TrainNext / TrainPrev

TrainNext(start) is shorthand for Range(0, 1, TrainForward(start)) — the next single payload from the cursor’s position. TrainPrev(start) is the backward mirror. They follow the same exclusive-start rule.

CursorTrainNext yieldsTrainPrev yields
sentinelleftmost carriage’s payloadrightmost carriage’s payload
carriage XX’s right neighbor’s payload (or empty if X is rightmost)X’s left neighbor’s payload (or empty if X is leftmost)
# Peek at the front / back of the train.
front = next(iter(storage.mem_get(
    memory.train_next(memory.train_carriage(train_domain))))).note
back  = next(iter(storage.mem_get(
    memory.train_prev(memory.train_carriage(train_domain))))).note

Iteration rule (implementation)

Each step:

  1. Integrity — sentinel is identified by pod == P0. The sentinel MUST have an identity PAYLOAD; a carriage MUST have a non-identity PAYLOAD. Either violation returns FailedPrecondition (the train is malformed).
  2. Advance — on a carriage, follow the matching-direction key (RIGHT for forward, LEFT for backward) to the next neighbor. On the sentinel, follow the opposite-direction key — sentinel’s LEFT/RIGHT are anchor pointers, so the opposite key takes you to the appropriate end.
  3. Yield — yield the chunk you advanced to. Stop when the next pointer is identity.

The advance-then-yield order is what makes the iteration exclusive of start.

Hierarchy: the CHILDREN slot

Each carriage’s CHILDREN 👨‍👩‍👧‍👦 slot can point at the sentinel of another train (or any other chunk). Attach one at push time via the children argument; query it via the TrainChildren(parent) selector.

# Push a parent carriage whose CHILDREN points at a child train's sentinel.
storage.mem_set(memory.train_appender(
    parent_domain,
    memory.with_sparkle(payload_domain, "B"),
    children=memory.train_carriage(child_domain),
))

# Walk the child train from the parent.
for chunk in storage.mem_get(
    memory.train_forward(
        memory.train_children(
            memory.train_carriage(parent_domain, parent_carriage_pod)))):
    print(chunk.note)

TrainChildren(parent) yields zero chunks (no error) when the parent has identity CHILDREN or is missing entirely. Hierarchical traversal is the caller’s job — walk a parent train, recurse into TrainChildren on each carriage.

Last change: , commit: ea14c9b

Notebook Quick Start

This guide walks through using Kongming HV in a Jupyter notebook, cell by cell.

SectionDescription
Notebook PlatformsSetup differences between Jupyter, Colab, and Binder
Interactive NotebooksLinks to existing notebooks
WalkthroughStep-by-step: vocabulary, similarity, learning, binding

Tips

  • Reproducibility: Use fixed seeds in SparseOperation for deterministic results across reruns.
  • Visualization: Use pandas DataFrames for overlap matrices — they render nicely in Jupyter.
  • Performance: The Rust backend is fast. Building 10,000 vectors takes under a second on MODEL_64K_8BIT.
  • Model choice: Start with MODEL_64K_8BIT for exploration. Switch to MODEL_1M_10BIT or larger for production workloads.
Last change: , commit: 4d22850

Notebook Platforms

Setup and behavior differ across Jupyter, Google Colab, and Binder. This page covers the key differences.

Try Online

NotebookPlatformLink
first.ipynbGoogle ColabOpen In Colab
first.ipynbBinderBinder
memory.ipynbGoogle ColabOpen In Colab
lisp.ipynbGoogle ColabOpen In Colab

Comparison

Jupyter (local)Google ColabBinder
AccountNoneGoogle account requiredNone
Installpip install in terminal beforehand!pip install in first cellPre-installed via requirements.txt
Restart neededNoYes — after first installNo
Startup timeInstantFast (~5s)Slow (2–5 min cold start)
PersistenceLocal filesystemGoogle Drive (optional mount)Ephemeral — lost on timeout
GPUIf available locallyFree tier availableNot available
Custom packagesFull control!pip install per sessionVia requirements.txt only

Jupyter (Local)

Install once in your terminal, then use in any notebook:

pip install kongming-rs-hv
# Cell 1 — no restart needed
from kongming import hv

For development workflows with frequent code changes, use autoreload:

%load_ext autoreload
%autoreload 2

Google Colab

Colab runs in the cloud with a fresh environment each session. Install in the first cell:

# Cell 1 — install
!pip install kongming-rs-hv

After the first install, Colab requires a runtime restart:

  1. Go to Runtime → Restart runtime (or use the button Colab shows after install)
  2. Then run the remaining cells
# Cell 2 — after restart
from kongming import hv
model = hv.MODEL_64K_8BIT

Subsequent sessions on the same notebook will need the install cell again — Colab does not persist pip packages across sessions.

Saving work: Use google.colab.drive to mount Google Drive for persistent storage:

from google.colab import drive
drive.mount('/content/drive')
# Then use paths like /content/drive/MyDrive/...

Binder

Binder builds a Docker image from your repo’s requirements.txt and launches a Jupyter server. No account needed.

Binder

  • First launch: Takes 2–5 minutes to build the environment
  • Subsequent launches: Faster if the image is cached
  • No install needed: kongming-rs-hv is pre-installed from requirements.txt
  • Ephemeral: All work is lost when the session times out (~10 min idle)
# Cell 1 — works immediately, no install
from kongming import hv
Limitation
You cannot install additional packages not in requirements.txt (the environment is read-only).

Choosing a Platform

Use caseRecommended
Daily developmentJupyter (local)
Quick demo / sharingGoogle Colab
Zero-setup explorationBinder
Teaching / workshopsGoogle Colab (students have accounts)
Persistent storage neededJupyter (local) or Colab + Drive
Last change: , commit: ab5cf46

Interactive Notebooks

For deeper walkthroughs, open these notebooks directly:

NotebookDescriptionColab
first.ipynbIntroduction to hypervectors, bind/bundle operations, and compositesOpen In Colab
memory.ipynbIn-memory and persistent storage, near-neighbor search with attractors, and export to diskOpen In Colab
lisp.ipynbVSA-based LISP interpreter where every data structure is a hypervectorOpen In Colab

See also: LISP Interpreter — a full example built on the core API.

Last change: , commit: 20a6230

Walkthrough

A step-by-step introduction to Kongming HV in a notebook, cell by cell.

Setup

# Cell 1: Install and import
# !pip install kongming-rs-hv pandas

from kongming import hv
import pandas as pd

model = hv.MODEL_64K_8BIT
so = hv.SparseOperation(model, 0, 1)

Building a Vocabulary

# Cell 2: Create vectors for a set of words
words = ["cat", "dog", "fish", "bird", "tree", "rock"]
vectors = {w: hv.Sparkle.from_word(model, "vocab", w) for w in words}

print(f"Created {len(vectors)} vectors")
print(f"Model: {model}, Cardinality: {hv.cardinality(model)}")

Output:

Created 6 vectors
Model: 1, Cardinality: 256

Similarity Matrix

# Cell 3: Compute pairwise overlap
data = {}
for w1 in words:
    data[w1] = {w2: hv.overlap(vectors[w1], vectors[w2]) for w2 in words}

pd.DataFrame(data, index=words)

Output:

catdogfishbirdtreerock
cat25610211
dog12561012
fish01256101
bird20125610
tree11012561
rock12101256

The diagonal is 256 (cardinality = perfect self-overlap). Off-diagonal values are near 0-2 (random noise), confirming the vectors are near-orthogonal.

Learning from Observations

# Cell 4: Create a learner and feed it observations
learner = hv.Learner(model, hv.Seed128(0, so.uint64()))

# "cat" seen 3 times, "dog" once, "bird" once
for _ in range(3):
    learner.bundle(vectors["cat"])
learner.bundle(vectors["dog"])
learner.bundle(vectors["bird"])

print(f"Learner age: {learner.age()}")

Output:

Learner age: 5

Probing the Learner

# Cell 5: Check what the learner remembers
results = []
for w in words:
    ov = hv.overlap(learner, vectors[w])
    results.append({"word": w, "overlap": ov})

df = pd.DataFrame(results).sort_values("overlap", ascending=False)
df

Output:

wordoverlap
cat~75
dog~30
bird~30
fish~1
tree~1
rock~1

“cat” has the highest overlap (seen 3x). “dog” and “bird” (seen 1x each) have moderate overlap. Unseen words are at noise level (~1).

Binding: Role-Filler Pairs

# Cell 6: Create a structured representation
#   "a cat that is red"
color_role = hv.Sparkle.from_word(model, "role", "color")
animal_role = hv.Sparkle.from_word(model, "role", "animal")

red = hv.Sparkle.from_word(model, "color", "red")
blue = hv.Sparkle.from_word(model, "color", "blue")
cat = vectors["cat"]

# Bind role with filler, then bundle the pairs
learner2 = hv.Learner(model, hv.Seed128(0, so.uint64()))
learner2.bundle(hv.Sparkle.bind(color_role, red))
learner2.bundle(hv.Sparkle.bind(animal_role, cat))

# Probe: "what color?"
query = hv.Sparkle.bind(learner2, color_role.power(-1))
print(f"red overlap:  {hv.overlap(query, red)}")   # high
print(f"blue overlap: {hv.overlap(query, blue)}")   # ~1
print(f"cat overlap:  {hv.overlap(query, cat)}")    # ~1
Last change: , commit: 63ad966

Python Quick Start

SectionDescription
InstallationPyPI install, supported platforms, import paths
Quick ExampleMinimal code showing bind, bundle, and overlap
WalkthroughVectors, similarity, random generation, power, learning

See also: Notebook Quick Start for interactive Jupyter walkthroughs.

Last change: , commit: 0551c6e

Installation

PyPI

pip install kongming-rs-hv

Supported Platforms

PlatformArchitecturesPython Versions
Linuxx86_643.10–3.14
macOSApple Silicon & Intel3.10–3.14
Windowsx86_643.10–3.14

Verifying Installation

import kongming
print(kongming.__version__)  # e.g. should be "3.6.5", as of Apr. 2026. Yours should be newer. 

from kongming import hv
print(hv.MODEL_64K_8BIT)  # should print 1

Import Paths

The package exposes two main modules:

from kongming import hv       # hypervector operations
from kongming import memory   # storage and selectors

Model constants are available directly on hv:

hv.MODEL_64K_8BIT      # 1
hv.MODEL_1M_10BIT      # 2
hv.MODEL_16M_12BIT     # 3
hv.MODEL_256M_14BIT    # 4
hv.MODEL_4G_16BIT      # 5

Docker

If you’d rather not install anything on your host, you can run kongming-rs-hv inside a container. This works on any system with Docker — no Python, no virtualenv, no wheel compatibility to worry about.

One-liner: throwaway Python REPL

Drop straight into a Python shell with the package preinstalled:

docker run --rm -it python:3.12-slim sh -c "\
    pip install --quiet --root-user-action=ignore \
        --disable-pip-version-check kongming-rs-hv && python"

--rm removes the container on exit. Nothing is persisted. Re-running reinstalls from PyPI, which takes a few seconds. The --root-user-action=ignore and --disable-pip-version-check flags silence pip’s root-user and upgrade notices, which are harmless inside a throwaway container.

Reusable image

For repeat use, build a small image once:

# Dockerfile
FROM python:3.12-slim
RUN pip install --no-cache-dir --disable-pip-version-check kongming-rs-hv
CMD ["python"]
docker build -t kongming-hv .
docker run --rm -it kongming-hv

To run a script from the host instead of an interactive REPL, mount the current directory:

docker run --rm -v "$PWD":/work -w /work kongming-hv python my_script.py

JupyterLab in a container

For interactive exploration with notebooks:

# Dockerfile.jupyter
FROM python:3.12-slim
RUN pip install --no-cache-dir --disable-pip-version-check \
    kongming-rs-hv jupyterlab
WORKDIR /notebooks
EXPOSE 8888
CMD ["jupyter", "lab", "--ip=0.0.0.0", "--no-browser", \
     "--ServerApp.token=''", "--ServerApp.password=''"]
docker build -f Dockerfile.jupyter -t kongming-hv-jupyter .
docker run --rm -p 8888:8888 -v "$PWD":/notebooks kongming-hv-jupyter

Open http://localhost:8888 in your browser. Notebooks saved under /notebooks are persisted to the mounted host directory.

The disabled token/password above is fine for local use. Do not expose this container on a public network without adding authentication.

Last change: , commit: d21c656

Quick Example

A minimal example showing the core operations:

from kongming import hv

# Create hypervectors
a = hv.Sparkle.from_word(hv.MODEL_64K_8BIT, hv.d0(), "hello")
b = hv.Sparkle.from_word(hv.MODEL_64K_8BIT, hv.d0(), "world")
print(f'Overlap: {hv.overlap(a, b)}')  # Near orthogonal (~1)

# Bind: result is dissimilar to both inputs
bound = hv.bind(a, b)
print(f'{hv.overlap(bound, a)=}, {hv.overlap(bound, b)=}')  # ~1, ~1

# Bundle: result is similar to both inputs
bundled = hv.bundle(hv.Seed128(10, 1), a, b)
print(f'{hv.overlap(bundled, a)=}, {hv.overlap(bundled, b)=}')  # high, high

What’s Happening

  1. Sparkle.from_word generates a deterministic hypervector from a word. Same word always produces the same vector.
  2. Two unrelated vectors have near-zero overlap (~1) — random high-dimensional vectors are nearly orthogonal.
  3. hv.bind(a, b) produces a vector dissimilar to both (low overlap). Binding is reversible.
  4. hv.bundle(seed, a, b) produces a vector similar to both (high overlap). Different seeds produce different but equally valid results.
Last change: , commit: da51f66

Walkthrough

A deeper exploration of the Python API, covering vector creation, similarity, random generation, power/permutation, and online learning.

Creating Vectors

from kongming import hv

model = hv.MODEL_64K_8BIT

# Create sparkles (atomic vectors) from words
cat = hv.Sparkle.from_word(model, "animals", "cat")
dog = hv.Sparkle.from_word(model, "animals", "dog")

# Same inputs always produce the same vector
cat2 = hv.Sparkle.from_word(model, "animals", "cat")
assert cat.stable_hash() == cat2.stable_hash()

Measuring Similarity

# Random vectors have ~1 overlap
print(hv.overlap(cat, dog))   # ≈ 1 (near-orthogonal)

# A vector is maximally similar to itself
print(hv.overlap(cat, cat))   # = 256 (= cardinality)

Using SparseOperation for Random Generation

so = hv.SparseOperation(model, 123, 456)

# Generate random sparkles
a = hv.Sparkle.random("my_domain", so)
b = hv.Sparkle.random("my_domain", so)

# Each call to so produces a new random seed
print(hv.overlap(a, b))  # ≈ 1

Power and Permutation

# Power creates a permuted vector
s = hv.Sparkle.from_word(model, "pos", "step")
s2 = s.power(2)
s3 = s.power(3)

# Different powers are near-orthogonal
print(hv.overlap(s, s2))   # ≈ 1
print(hv.overlap(s, s3))   # ≈ 1

# Inverse: power(-1) undoes power(1)
s_inv = s.power(-1)
# bind(s, s_inv) ≈ identity

Online Learning with Learner

learner = hv.Learner(model, hv.Seed128(0, 42))

# Feed observations one at a time
learner.bundle(cat)
learner.bundle(cat)   # seen twice — stronger signal
learner.bundle(dog)

# The learned vector is more similar to cat (seen 2x)
print(hv.overlap(learner, cat))  # higher
print(hv.overlap(learner, dog))  # lower but above random
Last change: , commit: 63ad966

Examples

Standalone runnable scripts under examples/ — each demonstrates a different facet of hypervector computing. Click through for the walkthrough.

ExampleWhat it shows
Mexican DollarAnalogical reasoning of “What’s the Dollar of Mexico?”: bind/bundle as the math behind analogy.
Word IndexerEncoding and novel queries for 5,000 English words.
Bulk Storage BenchmarkPopulate various substrates with thousands of chunks and measure retrieval performance.
Operators from ScratchReimplement bind and bundle in pure Python — the core math underneath the library.
LISP InterpreterA full LISP where every atom, cons cell, and environment is a hypervector. For the VSA-curious.
Last change: , commit: 787267b

Operators from Scratch

Standalone script: operators.py

This example implements the bind, release, and bundle operators in pure Python using only the low-level offset API, then verifies correctness against the library’s built-in implementations.

The script does not call hv.bind(), hv.release(), or hv.bundle() for computation — it reimplements them to show how they work at the offset level.

Bind

Per-segment offset addition modulo segment size:

for seg in range(cardinality):
    result[seg] = (core_a.offset(seg) + core_b.offset(seg)) % segment_size

Properties:

  • Result is nearly orthogonal to both inputs (overlap ≈ 1)
  • Commutative: bind(a, b) == bind(b, a)
  • Associative: bind(a, b, c) == bind(bind(a, b), c)

Release (Unbind)

Per-segment offset subtraction modulo segment size:

for seg in range(cardinality):
    result[seg] = (core_c.offset(seg) - core_k.offset(seg)) % segment_size

Properties:

  • release(bind(a, b), b) = a (exact recovery)
  • Multi-release: release(release(bind(a, b, c), c), b) = a

Bundle

PRNG-based random selection among inputs. For each segment, a seeded PRNG picks which input vector contributes its offset. The selection probability is proportional to each input’s weight.

# Compute cumulative anchors from weights (weights sum to 1.0).
# For equal weights [0.33, 0.33, 0.33]: anchors ≈ [21845, 43690, 65535]
# For weighted [0.6, 0.2, 0.2]:        anchors ≈ [39321, 52428, 65535]
cumulative = 0.0
anchors = []
for w in weights:
    cumulative += w
    anchors.append(int(cumulative * 65535))

for seg in range(0, cardinality, 4):
    r = so.uint64()                          # one PRNG call → 4 × 16-bit values
    for j in range(4):
        dial = (r >> (48 - 16 * j)) & 0xFFFF # extract 16-bit random value
        chosen = first input whose anchor >= dial
        result[seg + j] = cores[chosen].offset(seg + j)

Properties:

  • Result is similar to all inputs (overlap ≈ weight × cardinality)
  • Not reversible — information is lost

Note: The library supports two bundling strategies: classic (shown above) and fisher_yates (default). To verify exact match with the pure Python implementation, set:

KONGMING_LEARNER_SAMPLING=classic python operators.py

Running

pip install kongming-rs-hv

# Classic sampling — all operators match exactly
KONGMING_LEARNER_SAMPLING=classic python operators.py
Last change: , commit: dab6d39

Mexican Dollar

Standalone scripts: mexican_dollar.py | mexican_dollar_memory.py

The “What’s the Dollar of Mexico?” problem is a classic demonstration of analogical reasoning with hypervectors. It shows how structured knowledge about countries can be encoded, and how algebraic operations can answer analogy questions without explicit programming.

The Problem

Given knowledge about three countries:

CountryCodeCapitalCurrency
USAUSAWashington DCDollar
MexicoMEXMexico CityPeso
SwedenSWEStockholmKrona

We want to answer questions like:

  • “What is the Dollar of Mexico?” → Peso
  • “What is the Washington DC of Mexico?” → Mexico City
  • “What is the Dollar of Sweden?” → Krona

How It Works

Each country is encoded as a bundled set of role-filler bindings:

To find “the Dollar of Mexico”, we compute a transfer vector from US to Mexico:

Then apply it to Dollar:

The result will have high overlap with Peso — the analogical answer.

The same transfer works for Sweden:

Code (Manual)

The algebraic approach — compute the transfer vector directly:

from kongming import hv

model = hv.MODEL_64K_8BIT
so = hv.SparseOperation(model, "knowledge", 0)

# Create role markers
country_code = hv.Sparkle.from_word(model, "role", "country_code")
capital      = hv.Sparkle.from_word(model, "role", "capital")
currency     = hv.Sparkle.from_word(model, "role", "currency")

# Create fillers
usa         = hv.Sparkle.from_word(model, "country", "usa")
mex         = hv.Sparkle.from_word(model, "country", "mex")
swe         = hv.Sparkle.from_word(model, "country", "swe")
dc          = hv.Sparkle.from_word(model, "capital", "dc")
mexico_city = hv.Sparkle.from_word(model, "capital", "mexico_city")
stockholm   = hv.Sparkle.from_word(model, "capital", "stockholm")
dollar      = hv.Sparkle.from_word(model, "currency", "dollar")
peso        = hv.Sparkle.from_word(model, "currency", "peso")
krona       = hv.Sparkle.from_word(model, "currency", "krona")

# Encode each country as role-filler bundles
us_record = hv.bundle(hv.Seed128.random(so),
    hv.bind(country_code, usa),
    hv.bind(capital, dc),
    hv.bind(currency, dollar),
)
mexico_record = hv.bundle(hv.Seed128.random(so),
    hv.bind(country_code, mex),
    hv.bind(capital, mexico_city),
    hv.bind(currency, peso),
)
sweden_record = hv.bundle(hv.Seed128.random(so),
    hv.bind(country_code, swe),
    hv.bind(capital, stockholm),
    hv.bind(currency, krona),
)

# Transfer vector: Mexico / US
transfer_to_mexico = hv.release(mexico_record, us_record)

# "What's the Dollar of Mexico?"
mexican_dollar = hv.bind(dollar, transfer_to_mexico)
print(f"peso overlap:  {hv.overlap(mexican_dollar, peso)}")    # high!
print(f"dollar overlap: {hv.overlap(mexican_dollar, dollar)}")  # ~1 (noise)
print(f"krona overlap:  {hv.overlap(mexican_dollar, krona)}")   # ~1 (noise)

# "What's the Washington DC of Mexico?"
mexican_dc = hv.bind(dc, transfer_to_mexico)
print(f"mexico_city overlap: {hv.overlap(mexican_dc, mexico_city)}")  # high!

# Transfer to Sweden works the same way
transfer_to_sweden = hv.release(sweden_record, us_record)
swedish_dollar = hv.bind(dollar, transfer_to_sweden)
print(f"krona overlap: {hv.overlap(swedish_dollar, krona)}")  # high!

Code (with AnalogicalReasoner)

When records are stored in memory (as Octopus composites), analogical_reasoner handles the transfer:

from kongming import hv, memory

model = hv.MODEL_64K_8BIT
store = memory.InMemory(model)

keys = ["capital", "currency", "country_code"]

# Store individual fillers — NNS needs them as searchable items
fillers = {}
for word in ["dc", "USD", "USA", "mexicoCity", "MXN", "MEX",
             "stockholm", "SEK", "SWE"]:
    s = hv.Sparkle.from_word(model, 0, word)
    store.put(s)
    fillers[word] = s

# Store country records as Octopus composites
store.put(hv.Octopus(
    hv.Seed128("country", "USA"), keys,
    fillers["dc"], fillers["USD"], fillers["USA"],
))
store.put(hv.Octopus(
    hv.Seed128("country", "MEX"), keys,
    fillers["mexicoCity"], fillers["MXN"], fillers["MEX"],
))
store.put(hv.Octopus(
    hv.Seed128("country", "SWE"), keys,
    fillers["stockholm"], fillers["SEK"], fillers["SWE"],
))

# Retrieve stored records
us_code  = store.get("country", "USA").code
mex_code = store.get("country", "MEX").code
swe_code = store.get("country", "SWE").code

view = store.new_view()

# "What is the USD of Mexico?"
result = memory.first_picked(view,
    memory.nns(
        memory.analogical_reasoner(
            memory.with_code(mex_code),
            us_code,
            fillers["USD"],
        )
    )
)
print(result.id)  # → ✨:🌱MXN

# "What is the Washington DC of Mexico?"
result = memory.first_picked(view,
    memory.nns(
        memory.analogical_reasoner(
            memory.with_code(mex_code),
            us_code,
            fillers["dc"],
        )
    )
)
print(result.id)  # → ✨:🌱mexicoCity

# "What is the Dollar of Sweden?"
result = memory.first_picked(view,
    memory.nns(
        memory.analogical_reasoner(
            memory.with_code(swe_code),
            us_code,
            fillers["USD"],
        )
    )
)
print(result.id)  # → ✨:🌱SEK

analogical_reasoner computes the transfer vector feature ⊗ inverse(src) internally and uses near-neighbor search to find the best match in memory — no manual algebra needed.

Why It Works

The transfer vector captures the structural mapping between the two records. When applied to any filler from the US record, it maps it to the corresponding filler in the Mexico record — because the role-filler binding structure is preserved by the algebra.

This is a form of analogical reasoning: no explicit rules, no lookup tables — just algebraic operations on high-dimensional vectors.

See Also

Last change: , commit: a446da0

Bulk Storage Benchmark

Standalone script: bulk_storage.py

This example populates a storage with a large number of random terminal chunks, then queries a few by key to verify correctness. It demonstrates how to batch-create items and measure throughput.

Note associative index is also prepared in the process, and near-neighbor search is available immediately upon successful conclusion of all writing.

Motivated readers can further improve this script to test various producers or selectors.

Script

#!/usr/bin/env python3
"""Populate local storage with random terminal chunks and verify retrieval."""

import argparse
import shutil
import tempfile
import time
from kongming import hv, memory


def main():
    parser = argparse.ArgumentParser(description="Bulk storage benchmark")
    parser.add_argument(
        "-n", "--count", type=int, default=10_000,
        help="Number of terminal chunks to create (default: 10000)",
    )
    parser.add_argument(
        "--model", type=int, default=hv.MODEL_1M_10BIT,
        help="HV model (default: MODEL_1M_10BIT)",
    )
    parser.add_argument(
        "--domain", type=str, default="bench",
        help="Domain name for all chunks (default: bench)",
    )
    parser.add_argument(
        "--backend", type=str,
        choices=["inmemory", "embedded"],
        default="inmemory",
        help="Storage backend (default: inmemory)",
    )
    parser.add_argument(
        "--path", type=str, default=None,
        help="Disk path for embedded backend (default: temp directory)",
    )
    args = parser.parse_args()

    # --- Create storage ---
    tmpdir = None
    if args.backend == "embedded":
        if args.path:
            path = args.path
        else:
            tmpdir = tempfile.mkdtemp()
            path = f"{tmpdir}/bench_store"
        storage = memory.Embedded(args.model, path)
        print(f"Backend: Embedded (path={path})")
    else:
        storage = memory.InMemory(args.model)
        print("Backend: InMemory (BTreeMap, pure in-memory)")

    # --- Write phase ---
    print(f"Writing {args.count:,} terminal chunks …")
    t0 = time.perf_counter()
    for i in range(args.count):
        storage.mem_set(memory.new_terminal(args.domain, str(i)))

    elapsed = time.perf_counter() - t0
    rate = args.count / elapsed
    print(f"  done in {elapsed:.2f}s  ({rate:,.0f} chunks/s)")
    print(f"  item_count = {storage.item_count():,}")

    # --- Read phase: spot-check a few items ---
    spot_checks = range(0, args.count, args.count // 100)
    print(f"Spot-checking keys: {spot_checks}")
    for idx in spot_checks:
        expected = hv.Sparkle.from_word(args.model, args.domain, str(idx))
        chunk = storage.get(args.domain, str(idx))
        if not hv.equal(chunk.id, expected):
            print(f"mismatch at key {idx}: {chunk}")

    print("All checks passed.")

    # --- Cleanup ---
    if tmpdir:
        del storage
        shutil.rmtree(tmpdir)


if __name__ == "__main__":
    main()

Usage

# Default: 10K chunks, in-memory storage substrate.
python bulk_storage.py

# Embedded (disk-backed storage substrate).
python bulk_storage.py --backend embedded

# Embedded with a specific path (tip: use a tmpfs mount for near-in-memory speed)
python bulk_storage.py --backend embedded --path /dev/shm/my_bench

# Custom count
python bulk_storage.py -n 100000

# Different model, 1 implies MODEL_64K_8BIT model, etc.
python bulk_storage.py -n 10000 --model 1
Last change: , commit: 08dd7d9

Word Indexer

Standalone script: word_indexer.py

This example encodes ~5,000 English words as Sequences of per-letter Sparkles, then queries them by exact word or by positional suffix (“six-letter words ending in er”, “eleven-letter words ending in tion”) using multi-attractor near-neighbor search.

It demonstrates four ideas together:

  • Using Sparkle as a stable per-symbol code (one Sparkle per az).
  • Using Sequence with a Pod-derived seed so chunks are addressable both by word (exact) and by structure (positional).
  • The ChunkProducer API (new_terminal, from_sequence_members, joiner) staged through a batched SubstrateMutableView via producer.produce(view).
  • Multi-attractor nns over SequenceAttractor for positional conjunctive queries.

The general idea

letters domain                       words domain
─────────────                        ────────────
"a" → Sparkle_a                      "the"      → Sequence(t, h, e)
"b" → Sparkle_b                      "language" → Sequence(l, a, n, g, u, a, g, e)
"c" → Sparkle_c                      ...
...                                  Pod   = word         (exact lookup key)
"z" → Sparkle_z                      note  = word         (recoverable in results)
                                     members = letter Sparkles in order
  • Letters as Sparkles. Pre-write 26 random-looking Sparkles, one per a–z, into a letters domain via new_terminal(letters, ch). Each letter’s Pod is the letter itself, so you can fetch it by by_item_key("letters", "e").
  • Words as Sequences. Each word is a Sequence in a words domain whose ordered members are the letter-Sparkles spelling it, built by from_sequence_members(...) with a joiner(...) of per-letter by_item_key selectors. The Sequence’s Pod is the word, so exact lookup is by_item_key("words", "language").
  • note carries the word string. Each word-chunk is written with note=<word>, so chunk.note recovers the word in result loops without decoding the Pod.

Batched writes via the ChunkProducer API

This example uses the producer API end-to-end. Producers compute their chunks at produce() time against a mutable view, mirroring Go’s producer.Produce(ctx, view) and Rust’s producer.produce(view, index). Storage’s new_mutable_view() is a context manager with transactional semantics:

  • All writes staged by producer.produce(view) calls between __enter__ and __exit__ go into a single batch.
  • Clean exit auto-commits; an exception inside the block discards everything.
  • view.commit() mid-block flushes the current batch and lets you continue staging — useful for pacing memory pressure on large ingests.

Letters and words go into two separate views; the second commits every BATCH_SIZE = 1000 words:

with storage.new_mutable_view() as view:
    for ch in "abcdefghijklmnopqrstuvwxyz":
        memory.new_terminal("letters", ch).produce(view)
    # auto-commits on __exit__

with storage.new_mutable_view() as view:
    for i, w in enumerate(words, start=1):
        members = memory.joiner(*[memory.by_item_key("letters", ch) for ch in w])
        # semantic_indexing=True: index the Sequence's code so suffix
        # queries (sequence_attractor) can find words by structure.
        memory.from_sequence_members(
            "words", w, members, note=w, semantic_indexing=True,
        ).produce(view)
        if i % BATCH_SIZE == 0:
            view.commit()
    # trailing writes auto-commit on __exit__

See Substrate & Views for the full view API.

Multi-attractor NNS

A sequence_attractor(member_selector, pos, domain) is a positional constraint: “Sequences in domain whose member at pos overlaps with member_selector”. Position is 0-based.

nns(*attractors) evaluates all attractors and ranks Sequences by combined overlap. With multiple attractors, the result is a conjunction — a chunk must satisfy each positional constraint to score well.

For “six-letter words ending in er”:

memory.nns(
    memory.sequence_attractor(memory.by_item_key("letters", "e"), 4, WORDS_DOMAIN),
    memory.sequence_attractor(memory.by_item_key("letters", "r"), 5, WORDS_DOMAIN),
)

This returns Sequences with e at index 4 and r at index 5 — i.e., the last two characters of a six-letter word.

For “eleven-letter words ending in tion”, anchor t/i/o/n at positions 7/8/9/10.

Counting and ranged results

storage.mem_get(selector) returns the full ranked result list as a Python list. Two helpers shape the output:

CallUse
mem_get(nns(...))Get every match. len(...) is the count.
mem_get(range_sel(nns(...), start, limit))Materialize a window — useful for top-N.

range_sel(inner, start, limit) consumes its inner selector, so to demonstrate both count and top 10 the example builds the NNS selector twice (cheap; the substrate work dominates).

See Working with Results for more on shaping selector output. When you need per-result SelectorExtra (e.g. NNS scores) or lazy iteration, reach for memory.lazy_selector_iter(view, selector)mem_get returns Chunks only.

Running

pip install kongming-rs-hv
python examples/word_indexer/word_indexer.py

Expected output shape:

Ingested 4982 words in 8.4s.

by word 'the': 1 match(es) [0.3 ms]
   1. the

by word 'people': 1 match(es) [0.3 ms]
   1. people

****er  (6 letters): N match(es) [~190 ms]
   1. <some six-letter -er word>
   ...

*******tion (11 letters): M match(es) [~480 ms]
   1. <some eleven-letter -tion word>
   ...

Approximate timings on an Apple Silicon laptop with the InMemory backend:

OperationTime
Ingest ~5,000 words via producer API~9 s
Exact lookup via by_item_key<1 ms
Multi-attractor NNS (2 attractors, e.g. *****er)~200 ms
Multi-attractor NNS (4 attractors, e.g. *******tion)~460 ms

A note on semantic_indexing

For NNS by composite structure (i.e. “find Sequences whose member at position N matches X”), each word’s producer is constructed with semantic_indexing=True. This impresses the Sequence’s code into the associative index alongside the chunk’s id-Sparkle (which is always indexed). Without the flag, only the id is indexed and sequence_attractor queries return zero hits.

The letter terminals are written without the flag because their code is the id-Sparkle, so id-only indexing is sufficient.

Switching to persistent storage

InMemory is fine for a demo. For a persistent store, swap one line:

storage = memory.Embedded(MODEL, "/path/to/db")

Everything else is identical.

Data attribution

Word-frequency data in top5000.txt is sourced from www.wordfrequency.info (top-5000 English words). Please credit the source when reusing this data.

Format (tab-separated, no header):

Rank    Word    POS    Frequency    Dispersion

See Also

Last change: , commit: 997a21c

LISP Interpreter

A LISP interpreter where every data structure — atoms, cons cells, lists, closures — is encoded as a hypervector. No traditional memory allocation, no pointers, no garbage collector. All computation happens through hypervector algebra.

Two Implementations

The LISP interpreter ships in two forms, both feature-identical:

Pure Python (pylisp)Rust (kongming_rs.lisp)
SourceOpen-sourced in examples/pylisp/Compiled into kongming-rs-hv
ReadableYes — ~500 lines of annotated PythonNo — compiled Rust binary
PerformanceSlower (Python overhead per operation)Faster (native code)
Importfrom pylisp import LispEnvfrom kongming.lisp import LispEnv
Dependencieskongming-rs-hv (for hypervector primitives)Included in kongming-rs-hv
Use caseLearning, debugging, extendingProduction, notebooks

Rust (built-in)

The kongming-rs-hv package includes a Rust-based LISP interpreter built directly on the internal Rust API and primitives. This implementation is compiled into the Python wheel and accessible via from kongming.lisp import LispEnv.

Since it operates on Rust-native hypervector types with zero Python overhead, it delivers the best performance for production use.

Python (open-source)

For research and study, we provide a pure-Python implementation of the same interpreter, built entirely on the public Python API of kongming-rs-hv. It mirrors the Rust implementation’s architecture but uses Python-level operations (hv.bind, hv.bundle, hv.release, etc.), making the underlying hypervector mechanics fully transparent and easy to modify.

This implementation is ideal for:

  • Understanding how LISP primitives map to hypervector operations
  • Experimenting with alternative encodings or evaluation strategies
  • Teaching and prototyping

Quick Start

pip install kongming-rs-hv
# Pure Python
from pylisp import LispEnv

env = LispEnv()
env.eval("(CAR (QUOTE (A B C)))")       # => "A"
env.eval("(CDR (QUOTE (A B C)))")       # => "(B C)"
env.eval("(CONS (QUOTE A) (QUOTE B))")  # => "(A . B)"
# Rust (same API, same results)
from kongming.lisp import LispEnv

env = LispEnv()
env.eval("(CAR (QUOTE (A B C)))")       # => "A"

Supported Forms

McCarthy’s 7 Primitives (1960)

FormExampleResult
QUOTE(QUOTE (A B C))(A B C)
CAR(CAR (QUOTE (A B C)))A
CDR(CDR (QUOTE (A B C)))(B C)
CONS(CONS (QUOTE A) (QUOTE B))(A . B)
ATOM(ATOM (QUOTE A))T
EQ(EQ (QUOTE A) (QUOTE A))T
COND(COND ((EQ (QUOTE A) (QUOTE B)) (QUOTE NO)) (T (QUOTE YES)))YES

Extensions

FormDescription
LAMBDAAnonymous functions with curried beta-reduction and variable shadowing
LABELRecursive self-reference (enables recursion without mutation)
DEFINEBind a name to a function in the environment

Examples

# Lambda
env.eval("((LAMBDA (X) (CAR X)) (QUOTE (A B C)))")  # => "A"

# Define a reusable function
env.eval("(DEFINE SECOND (LAMBDA (L) (CAR (CDR L))))")
env.eval("(SECOND (QUOTE (X Y Z)))")                 # => "Y"

# Recursion with LABEL
env.eval(
    "(DEFINE LAST (LAMBDA (L) "
    "  ((LABEL REC (LAMBDA (X) "
    "    (COND ((ATOM (CDR X)) (CAR X)) "
    "          (T (REC (CDR X)))))) L)))"
)
env.eval_full("(LAST (QUOTE (A B C)))")              # => "C"

How It Works

Each LISP value is a Sparkle — a sparse binary hypervector seeded by its content. Atoms like A, B, CAR are sparkles in a symbol domain.

A cons cell (a . b) is encoded as:

cell = bundle(bind(a, LHS), bind(b, RHS))

where LHS and RHS are fixed tag sparkles. The cell is stored under a fresh random sparkle id. To extract:

car(id) = cleanup(release(cell, LHS))
cdr(id) = cleanup(release(cell, RHS))

The release operation is noisy — it produces an approximate result. Cleanup uses near-neighbor search (NNS) over the substrate’s associative index to find the exact stored sparkle that best matches the noisy probe.

File Structure

examples/pylisp/
  __init__.py      # Package entry point
  types.py         # HyperBinary type alias
  env.py           # LispEnv: domains, symbols, lexicon, substrate
  cons.py          # Cons cells: cons, car, cdr, cleanup via NNS
  reader.py        # S-expression tokenizer and parser
  evaluator.py     # Single-step and fixed-point evaluator
  lambda_.py       # Beta-reduction with currying and shadowing
  printer.py       # Hypervector → S-expression display
  test_pylisp.py   # 16 tests mirroring the Rust integration suite

Storage Backends

# In-memory (default, volatile)
env = LispEnv()

# Persistent (Embedded disk-backed)
env = LispEnv(path="/tmp/my_lisp_db")

Running Tests

pip install pytest kongming-rs-hv
pytest examples/pylisp/test_pylisp.py -v

Notebook

We provide a Colab notebook that runs both implementations side by side, demonstrating correctness parity and performance comparison:

Open In Colab

References

Last change: , commit: 08dd7d9