Graph Builder¶
EvenRegularGraphBuilder¶
The class EvenRegularGraphBuilder is used to construct search graphs.
- class deglib.builder.EvenRegularGraphBuilder(graph: MutableGraph, rng: Mt19937 | None = None, optimization_target: OptimizationTarget = OptimizationTarget.LowLID, extend_k: int = 0, extend_eps: float = 0.1, improve_k: int = 0, improve_eps: float = 0.001, max_path_length: int = 5, swap_tries: int = 0, additional_swap_tries: int = 0)¶
Bases:
objectConstructs an EvenRegularGraphBuilder for building and optimizing a regular graph.
This class provides functionality to incrementally build and optimize a graph structure by adding/removing vertices and improving edge connections through various strategies.
- add_entry(external_label: int | Iterable[int] | ndarray, feature: ndarray)¶
Provide the builder with new entries to be added to the graph during the build process.
Can add a single entry or a batch of entries. For batch processing, the feature array should have shape [N, D] where N is the number of entries and D is the dimensionality.
- Parameters:
external_label (int | Iterable[int] | np.ndarray) – The label(s) that name the added vertex/vertices. For batch processing, should be a list or array with length N matching the number of features
feature (np.ndarray) – The feature vector(s) to be added. Shape [D] for single entry or [N, D] for batch
- Raises:
InvalidShapeException – If feature array has invalid shape (not 1D or 2D)
AssertionError – If number of features doesn’t match number of labels in batch processing
- remove_entry(external_label: int)¶
Command the builder to remove a vertex from the graph as quickly as possible.
- Parameters:
external_label (int) – The external label that identifies the graph vertex to remove
- get_num_new_entries() int¶
Get the number of entries that will be added to the graph.
- Returns:
Number of entries queued for addition
- Return type:
int
- get_num_remove_entries() int¶
Get the number of entries that will be removed from the graph.
- Returns:
Number of entries queued for removal
- Return type:
int
- set_thread_count(thread_count: int)¶
Set the number of threads used to extend the graph during building.
When the thread count is greater than 1 and the optimization target is not StreamingData, the builder will utilize multiple threads to add elements to the graph in parallel. By default, all available CPU cores/threads are used unless specified. Note: The order in which elements are added is not guaranteed when using multiple threads.
- Parameters:
thread_count (int) – Number of threads to use for graph extension
- set_batch_size(tasks_per_batch: int, task_size: int)¶
Set the batch size parameters for parallel graph construction.
The builder processes elements in batches to minimize synchronization between threads. The total batch size is calculated as:
batch_size = thread_count * tasks_per_batch * task_size
- Effects of thread count and batch size:
thread_count = 1 and batch_size = 1: low throughput, medium latency, order of elements is guaranteed
thread_count > 1 and batch_size = 1: high throughput, low latency, order of elements is not guaranteed
thread_count > 1 and batch_size > 1: highest throughput, highest latency, order of elements is not guaranteed
Note: The optimization target StreamingData always uses a thread count of 1.
- Parameters:
tasks_per_batch (int) – Number of tasks for each thread in one batch (default: 32).
task_size (int) – Number of elements each thread processes in one task (default: 10).
- get_batch_size() int¶
Get the current batch size used for parallel graph construction.
- Returns:
The current batch size
- Return type:
int
- build(callback: Callable[[BuilderStatus], None] | str | None = None, infinite: bool = False)¶
Build the graph. This could be run on a separate thread in an infinite loop. Call stop() to end this process.
- Parameters:
callback – The callback that is called after each step of the build process. A BuilderStatus is the only argument to the function. If None nothing is printed. If callback is the string “progress”, a simple progress bar is printed to stdout.
infinite – If set to True, blocks indefinitely, until the stop() function is called. Can be used, if build() is run in a separate thread.
- stop()¶
Stop the build process.
The build process can only be stopped between steps, not during step execution.
build_from_data()¶
- deglib.builder.build_from_data(data: ndarray, labels: Iterable[int] | None = None, edges_per_vertex: int = 32, capacity: int = -1, metric: Metric = Metric.L2, rng: Mt19937 | None = None, optimization_target: OptimizationTarget = OptimizationTarget.LowLID, extend_k: int = 0, extend_eps: float = 0.2, improve_k: int = 0, improve_eps: float = 0.001, max_path_length: int = 5, swap_tries: int = 0, additional_swap_tries: int = 0, callback: Callable[[BuilderStatus], None] | str | None = None) SizeBoundedGraph¶
Create a new graph built from the given data using an EvenRegularGraphBuilder.
This is a convenience function that creates a graph, builder, adds all data entries, and builds the complete graph in one call. Infinite building is not supported.
- Parameters:
data (np.ndarray) – Feature data with shape [N, D] where N is number of samples, D is dimensionality
labels (Iterable[int] | None) – Labels for each data entry with shape [N]. If None, labels 0 to N-1 are created. If numpy array, should have dtype uint32
edges_per_vertex (int) – Number of edges per vertex in the graph
capacity (int) – Maximum number of vertices the graph can hold. If <= 0, defaults to data.shape[0]
metric (Metric) – Distance metric for measuring feature similarity
rng (Mt19937 | None) – Random number generator. If None, a new Mt19937 will be created
optimization_target (OptimizationTarget) – Optimization strategy based on data distribution characteristics
extend_k (int) – Number of neighbors to consider when extending the graph
extend_eps (float) – Epsilon value for neighbor search during graph extension
improve_k (int) – Number of neighbors to consider when improving the graph
improve_eps (float) – Epsilon value for neighbor search during graph improvement
max_path_length (int) – Maximum number of edge swaps in a single improvement attempt
swap_tries (int) – Number of improvement attempts per build step
additional_swap_tries (int) – Additional improvement attempts after a successful improvement
callback (Callable[[deglib_cpp.BuilderStatus], None] | str | None) – Callback function for build progress reporting. If “progress”, shows progress bar
- Returns:
The constructed and optimized graph
- Return type: