Graph Builder

EvenRegularGraphBuilder

The class EvenRegularGraphBuilder is used to construct search graphs.

class deglib.builder.EvenRegularGraphBuilder(graph: MutableGraph, rng: Mt19937 | None = None, optimization_target: OptimizationTarget = OptimizationTarget.LowLID, extend_k: int = 0, extend_eps: float = 0.1, improve_k: int = 0, improve_eps: float = 0.001, max_path_length: int = 5, swap_tries: int = 0, additional_swap_tries: int = 0)

Bases: object

Constructs an EvenRegularGraphBuilder for building and optimizing a regular graph.

This class provides functionality to incrementally build and optimize a graph structure by adding/removing vertices and improving edge connections through various strategies.

add_entry(external_label: int | Iterable[int] | ndarray, feature: ndarray)

Provide the builder with new entries to be added to the graph during the build process.

Can add a single entry or a batch of entries. For batch processing, the feature array should have shape [N, D] where N is the number of entries and D is the dimensionality.

Parameters:
  • external_label – The label(s) that name the added vertex/vertices. For batch processing, should be a list or array with length N matching the number of features

  • feature – The feature vector(s) to be added. Shape [D] for single entry or [N, D] for batch

Raises:
  • InvalidShapeException – If feature array has invalid shape (not 1D or 2D)

  • AssertionError – If number of features doesn’t match number of labels in batch processing

remove_entry(external_label: int)

Command the builder to remove a vertex from the graph as quickly as possible.

Parameters:

external_label (int) – The external label that identifies the graph vertex to remove

get_num_new_entries() int

Get the number of entries that will be added to the graph.

Returns:

Number of entries queued for addition

Return type:

int

get_num_remove_entries() int

Get the number of entries that will be removed from the graph.

Returns:

Number of entries queued for removal

Return type:

int

set_thread_count(thread_count: int)

Set the number of threads used to extend the graph during building.

When the thread count is greater than 1 and the optimization target is not StreamingData, the builder will utilize multiple threads to add elements to the graph in parallel. By default, all available CPU cores/threads are used unless specified. Note: The order in which elements are added is not guaranteed when using multiple threads.

Parameters:

thread_count (int) – Number of threads to use for graph extension

set_batch_size(tasks_per_batch: int, task_size: int)

Set the batch size parameters for parallel graph construction.

The builder processes elements in batches to minimize synchronization between threads. The total batch size is calculated as: batch_size = thread_count * tasks_per_batch * task_size

Effects of thread count and batch size:
  • thread_count = 1 and batch_size = 1: low throughput, medium latency, order of elements is guaranteed

  • thread_count > 1 and batch_size = 1: high throughput, low latency, order of elements is not guaranteed

  • thread_count > 1 and batch_size > 1: highest throughput, highest latency, order of elements is not guaranteed

Note: The optimization target StreamingData always uses a thread count of 1.

Parameters:
  • tasks_per_batch (int) – Number of tasks for each thread in one batch (default: 32).

  • task_size (int) – Number of elements each thread processes in one task (default: 10).

get_batch_size() int

Get the current batch size used for parallel graph construction.

Returns:

The current batch size

Return type:

int

build(callback: Callable[[BuilderStatus], None] | str | None = None, infinite: bool = False)

Build the graph. This could be run on a separate thread in an infinite loop. Call stop() to end this process.

Parameters:
  • callback – The callback that is called after each step of the build process. A BuilderStatus is the only argument to the function. If None nothing is printed. If callback is the string “progress”, a simple progress bar is printed to stdout.

  • infinite – If set to True, blocks indefinitely, until the stop() function is called. Can be used, if build() is run in a separate thread.

stop()

Stop the build process.

The build process can only be stopped between steps, not during step execution.

build_from_data()

deglib.builder.build_from_data(data: ndarray, labels: Iterable[int] | None = None, edges_per_vertex: int = 32, capacity: int = -1, metric: Metric = Metric.L2, rng: Mt19937 | None = None, optimization_target: OptimizationTarget = OptimizationTarget.LowLID, extend_k: int = 0, extend_eps: float = 0.2, improve_k: int = 0, improve_eps: float = 0.001, max_path_length: int = 5, swap_tries: int = 0, additional_swap_tries: int = 0, callback: Callable[[BuilderStatus], None] | str | None = None) SizeBoundedGraph

Create a new graph built from the given data using an EvenRegularGraphBuilder.

This is a convenience function that creates a graph, builder, adds all data entries, and builds the complete graph in one call. Infinite building is not supported.

Parameters:
  • data (np.ndarray) – Feature data with shape [N, D] where N is number of samples, D is dimensionality

  • labels (Iterable[int] | None) – Labels for each data entry with shape [N]. If None, labels 0 to N-1 are created. If numpy array, should have dtype uint32

  • edges_per_vertex (int) – Number of edges per vertex in the graph

  • capacity (int) – Maximum number of vertices the graph can hold. If <= 0, defaults to data.shape[0]

  • metric (Metric) – Distance metric for measuring feature similarity

  • rng (Mt19937 | None) – Random number generator. If None, a new Mt19937 will be created

  • optimization_target (OptimizationTarget) – Optimization strategy based on data distribution characteristics

  • extend_k (int) – Number of neighbors to consider when extending the graph

  • extend_eps (float) – Epsilon value for neighbor search during graph extension

  • improve_k (int) – Number of neighbors to consider when improving the graph

  • improve_eps (float) – Epsilon value for neighbor search during graph improvement

  • max_path_length (int) – Maximum number of edge swaps in a single improvement attempt

  • swap_tries (int) – Number of improvement attempts per build step

  • additional_swap_tries (int) – Additional improvement attempts after a successful improvement

  • callback (Callable[[deglib_cpp.BuilderStatus], None] | str | None) – Callback function for build progress reporting. If “progress”, shows progress bar

Returns:

The constructed and optimized graph

Return type:

SizeBoundedGraph