Release Notes

v1.3.0 (2024-07-13)

Added ad-hoc custom stopwords support for the full-text search query.
- The full-text search query now supports ad-hoc custom stopwords that are not defined in the analyzer configuration.
- The custom stopwords can be specified in the query using the custom_stop_words attribute in the $search operator.
Added ad-hoc custom synonyms support for the full-text search query.
- The full-text search query now supports ad-hoc custom synonyms that are not defined in the analyzer configuration.
- The custom synonyms can be specified in the query using the custom_synonyms attribute in the $search operator.
Added system metrics APIs.
- It is designed to provide information about the system resources, such as CPU, memory, disk, and network usage in the future. However, the current implementation only provides the version information.
Added export command to the CLI.
- The export command allows users to export the data from the database collection to a file in JSON Lines format.
Added array of vectors support for the dense vector index.
- The dense vector index now supports an array of vectors as the field value. The embedding models usually work with short text, so the array of vectors can be used to store multiple embeddings for a single document. Without this feature, users had to store each embedding in a separate document, which was inefficient. Note that this feature is experimental and may change in the future.

Improved the performance of the full-text search index that contains dense vector fields.

Bug fixes for query execution
- Fixed an issue where the $join operator with left or right type was not working correctly when the join key was duplicated in the right-side collection. In this case, some documents were missing in the result.
- Fixed an error where the $limit operator with a negative value was not working correctly. In this case, the query should return all documents, but it was returning an empty result.
Bug fixes for full-text search
- Fixed an issue where the highlighter was not working correctly in some cases when the query contained multiple terms.

Improved the HNSW-IVF index to use the underlying key-value storage directly, instead of loading the index data into memory.
- This change allows the HNSW-IVF index to handle larger indexes with a smaller memory footprint.
Improved the index creation process to be resumable.
- If the index creation process is interrupted, it can be resumed from the last checkpoint.

Reduced the memory usage of the block cache by evicting the cached blocks more aggressively.
Improved the performance of the full-text search by implementing query cache.
Improved the performance of the text analysis pipelines by optimizing the tokenization process.
Improved the performance of the index creation by optimizing the data loading process.

Updated KeywordAnalyzer to support the array-typed fields. It will tokenize the array elements as separate tokens.
Updated DenseVectorAnalyzer to support the null-valued fields. The null-valued fields will not be indexed in the HNSW index.
Added the validation process of the index type when creating an index.

Bug fixes for Full-Text Search
- Fixed a bug where the query parser was not working correctly with the non-ASCII field names in the full-text search query.
- Fixed a bug where the query planner was incorrectly optimizing the full-text search query in some cases.
Bug fixes for storage engine
- Fixed a rare crash in the compaction filter during the shutdown process.

Improvements for Full-Text Search
- Added support for multi-field term in the full-text search query.
  - {!fields=[field1, field2]}:(term1 AND term2) will be parsed as field1:(term1 AND term2) field2:(term1 AND term2).
- Added support for out-of-order proximity search.
  - "term1 term2"~~5 will match documents where term1 and term2 are within 5 words of each other, regardless of the order.
Updated $project operator to support unnesting objects and arrays.
- ^ prefix can be used to unnest an object or array.
  - $project: ["^field1.field2", "field3"] will unnest field1.field2 and keep field3 as is.
The query parser now supports the did you mean? feature.
- If the query parser cannot recognize the operator in the query, it will suggest the similar operator.

Improved the indexing performance of the full-text search index by relaxing the concurrency control during the indexing process.

The default value of the $search.timeout option has been changed from 5,000 to 10,000, which means the search query will be timed out after 10 seconds by default.
The value of the $limit operator can be set to -1 to increase the limit to the maximum number of documents that can be returned in a single query. The maximum number of documents that can be returned in a single query varies depending on the index configuration, query complexity, and system resources. For instance, if the index is configured as full-text search, the maximum number of documents that can be returned in a single query is 10,000,000, and other indexes have no limit as long as the system resources are sufficient.
The $recursive_unnest operator applied on _highlights field that generated by $search.highlight option will unnest _score field as well.

Bug fixes for Full-Text Search
- Fixed a bug where the highlighter was not working correctly with the array typed fields.
Bug fixes for DocumentDB
- Fixed a bug where the $project operator was not working correctly with the nested fields.
- Fixed a bug where the $sort operator was not working correctly with floating point numbers.
- Fixed a bug where the $skip and $limit operators were not working correctly with multi-staged pipeline queries.

This is a first stable release of Cognica Server. It provides the following features:

Key-Value Database
- Basic CRUD operations
- Transactions
- Key-level Time-to-Live (TTL) support
Document-oriented Database
- Basic CRUD operations
- Transactions
- Secondary indexes
  - Unique and non-unique indexes
  - Clustered and non-clustered indexes
  - Partial indexes
  - Full-featured text search indexes
  - Dense vector similarity search indexes
  - Index-level Time-to-Live (TTL) support
- Python scripting support in the query layer
Language model serving
- Deployment of custom Torch-based language models