Search Modes#
ARIEL’s search system is built around search modules — leaf-level functions that each implement a single retrieval strategy against the logbook database. The framework ships two modules out of the box: keyword full-text search and embedding-based semantic similarity. At query time, the ARIELSearchService routes each request to the requested module. All modes share the same underlying database and produce a common ARIELSearchResult. Higher-level reasoning over results — multi-step retrieval, answer synthesis, custom prompting — lives in the Osprey agent layer, which calls these search modules through ARIEL’s MCP tools. A raw sql_query MCP tool is also available for direct database access by power users and the agent.
Search modules are discovered through Osprey’s central registry, so you can add your own without modifying any framework code. A custom search module only needs to export a get_tool_descriptor() function. Once registered, it is automatically available to the Osprey agent through the ARIEL MCP server and in the web interface.
Search Architecture#
User Query
↓
ARIELSearchService.search(mode=...)
├── KEYWORD (default) → keyword_search() → ranked entries
└── SEMANTIC → semantic_search() → ranked entries
↓
ARIELSearchResult (entries, search_modes_used)
The service validates that the requested mode is enabled in configuration before routing. Both keyword and semantic are direct function calls and return an ARIELSearchResult with entries and the search mode that was invoked. (A separate sql_query MCP tool exposes raw read-only SQL against the same database; it is not routed through search(mode=...).)
CLI usage:
osprey ariel search "RF cavity fault" # default: keyword
osprey ariel search "RF cavity fault" --mode keyword
osprey ariel search "RF cavity fault" --mode semantic
The --mode option accepts keyword (default) or semantic.
Search Modules#
Search modules are leaf-level functions that execute a single search strategy against the database. Each module exports a get_tool_descriptor() function that describes its capabilities, input schema, and execution function so the rest of the system — the ARIEL MCP server and the web interface — can discover and use it automatically. The framework ships with the following built-in search modules:
Module: search/keyword.py
PostgreSQL full-text search with optional fuzzy matching fallback. Best for specific terms, equipment names, PV names, and exact phrases.
Query syntax:
# Simple terms (implicit AND)
RF cavity fault
# Boolean operators
RF AND cavity
vacuum OR pressure
beam NOT injection
# Quoted phrases
"RF cavity trip"
# Field prefixes
author:smith
date:2024-06
# Combined
author:jones "beam loss" date:2024-01
How it works:
Validates and preprocesses the query — empty queries return immediately, queries longer than 1,000 characters are truncated, and unbalanced quotes are auto-balanced by removing the last unmatched quote
Parses the query to extract field filters (
author:,date:), quoted phrases, and remaining search termsBuilds a PostgreSQL
tsqueryusing the function appropriate for the query shape:plainto_tsquery— for simple terms (implicit AND)websearch_to_tsquery— for queries with Boolean operators (AND, OR, NOT)phraseto_tsquery— for quoted phrases
When multiple components are present (e.g. terms and phrases), they are combined with
&&(tsquery AND).Executes full-text search against the
raw_textcolumn withts_rankscoring, applying any field filters (author ILIKE, date range) and time range constraintsIf no results and fuzzy fallback is enabled, falls back to
pg_trgmtrigram similarity (default threshold: 0.3)Returns results as
(entry, score, highlights)tuples — highlights are generated viats_headline
Configuration:
search_modules:
keyword:
enabled: true
Module: search/semantic.py
Embedding-based similarity search using pgvector. Best for conceptual queries where exact keywords may not appear in the text.
How it works:
Resolves the similarity threshold using a 3-tier priority:
Per-query
similarity_thresholdparameter (highest)Config value (
search_modules.semantic.settings.similarity_threshold)Hardcoded default: 0.5 (lowest)
Determines the embedding model from config (
search_modules.semantic.model) and resolves provider credentials via Osprey’s centralizedapi.providersconfigurationGenerates a query embedding using the configured provider, with a dimension-mismatch warning if the returned embedding size does not match the configured
embedding_dimensionSearches the per-model embedding table using cosine distance (
<=>operator)Filters results by similarity threshold and optional time range
Returns results as
(entry, similarity_score)tuples
Configuration:
search_modules:
semantic:
enabled: true
provider: ollama
model: nomic-embed-text
settings:
similarity_threshold: 0.5
embedding_dimension: 768
Requirements: Ollama (or another embedding provider) running with the configured model, embedding table populated via the text_embedding enhancement module, and the pgvector extension installed in PostgreSQL.
Registering a custom search module:
To add your own search module, create a Python module that exports get_tool_descriptor() (and optionally get_parameter_descriptors()), then register it through your application’s registry configuration:
from osprey.registry.helpers import extend_framework_registry
from osprey.registry.base import ArielSearchModuleRegistration
app_config = extend_framework_registry(
ariel_search_modules=[
ArielSearchModuleRegistration(
name="my_search",
module_path="my_app.search.my_module",
description="Custom search module for my facility",
),
],
)
Once registered and enabled in config.yml (search_modules.my_search.enabled: true), the module is automatically available as an ARIEL MCP tool that the Osprey agent can call, and as a search option in the web interface. The get_tool_descriptor() function must return a SearchToolDescriptor:
SearchToolDescriptor — a frozen dataclass whose key fields are execute (the async search function), format_result (formats results for agent consumption), and args_schema (a Pydantic model for input validation). See the class definition in the source for the full field list.
Modules may also export get_parameter_descriptors() to declare tunable parameters for the frontend capabilities API. Each ParameterDescriptor describes a single knob — its name, type, default, range, and UI grouping — so the web interface can render controls dynamically.
Collaboration Welcome
If you implement a search module that could benefit other facilities — for example, a structured-metadata search, a time-series correlation search, or a cross-entry linking search — we encourage you to open a pull request so it becomes natively available in Osprey.
Need behavior beyond these search modules — multi-step reasoning, answer synthesis, custom prompting? That lives in the Osprey agent layer; see Osprey Integration under “Extending the integration.”
See Also#
- Data Ingestion
How data gets into the system — facility adapters, enhancement modules, and database schema
- Osprey Integration
MCP tools, service factory, and search result structure
- Web Interface
Web interface architecture and capabilities API