Spice v1.5.0 (July 21, 2025)
Announcing the release of Spice v1.5.0! 🔍
Spice v1.5.0 brings major upgrades to search and retrieval. It introduces native support for Amazon S3 Vectors, enabling petabyte scale vector search directly from S3 vector buckets, alongside SQL-integrated vector and tantivy-powered full-text search, partitioning for DuckDB acceleration, and automated refreshes for search indexes and views. It includes the AWS Bedrock Embeddings Model Provider, the Oracle Database connector, and the now-stable Spice.ai Cloud Data Connector, and the upgrade to DuckDB v1.3.2.
What's New in v1.5.0
Amazon S3 Vectors Support: Spice.ai now integrates with Amazon S3 Vectors, launched in public preview on July 15, 2025, enabling vector-native object storage with built-in indexing and querying. This integration supports semantic search, recommendation systems, and retrieval-augmented generation (RAG) at petabyte scale with S3’s durability and elasticity. Spice.ai manages the vector lifecycle—ingesting data, creating embeddings with models like Amazon Titan or Cohere via AWS Bedrock, or others available on HuggingFace, and storing it in S3 Vector buckets.
Example Spicepod.yml configuration for S3 Vectors:
datasets:
  - from: s3://my_data_bucket/data/
    name: my_vectors
    params:
      file_format: parquet
    acceleration:
      enabled: true
    vectors:
      engine: s3_vectors
      params:
        s3_vectors_aws_region: us-east-2
        s3_vectors_bucket: my-s3-vectors-bucket
    columns:
      - name: content
        embeddings:
          - from: bedrock_titan
            row_id:
              - id
Example SQL query using S3 Vectors:
SELECT *
FROM vector_search(my_vectors, 'Cricket bats', 10)
WHERE price < 100
ORDER BY score
For more details, refer to the S3 Vectors Documentation.
SQL-integrated Search: Vector and BM25-scored full-text search capabilities are now natively available in SQL queries, extending the power of the POST v1/search endpoint to all SQL workflows.
Example Vector-Similarity-Search (VSS) using the vector_search UDTF on the table reviews for the search term "Cricket bats":
SELECT review_id, review_text, review_date, score
FROM vector_search(reviews, "Cricket bats")
WHERE country_code="AUS"
LIMIT 3
Example Full-Text-Search (FTS) using the text_search UDTF on the table reviews for the search term "Cricket bats":
SELECT review_id, review_text, review_date, score
FROM text_search(reviews, "Cricket bats")
LIMIT 3
DuckDB v1.3.2 Upgrade: Upgraded DuckDB engine from v1.1.3 to v1.3.2. Key improvements include support for adding primary keys to existing tables, resolution of over-eager unique constraint checking for smoother inserts, and 13% reduced runtime on TPC-H SF100 queries through extensive optimizer refinements. The v1.2.x release of DuckDB was skipped due to a regression in indexes.
- Read the DuckDB v1.2.0 announcement.
- Read the DuckDB v1.3.0 announcement.
Partitioned Acceleration: DuckDB file-based accelerations now support partition_by expressions, enabling queries to scale to large datasets through automatic data partitioning and query predicate pruning. New UDFs, bucket and truncate, simplify partition logic.
New UDFs useful for partition_by expressions:
- bucket(num_buckets, col): Partitions a column into a specified number of buckets based on a hash of the column value.
- truncate(width, col): Truncates a column to a specified width, aligning values to the nearest lower multiple (e.g.,- truncate(10, 101) = 100).
Example Spicepod.yml configuration:
datasets:
  - from: s3://my_bucket/some_large_table/
    name: my_table
    params:
      file_format: parquet
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
      partition_by: bucket(100, account_id) # Partition account_id into 100 buckets
Full-Text-Search (FTS) Index Refresh: Accelerated datasets with search indexes maintain up-to-date results with configurable refresh intervals.
Example refreshing search indexes on body every 10 seconds:
datasets:
  - from: github:github.com/spiceai/docs/pulls
    name: spiceai.doc.pulls
    params:
      github_token: ${secrets:GITHUB_TOKEN}
    acceleration:
      enabled: true
      refresh_mode: full
      refresh_check_interval: 10s
    columns:
      - name: body
        full_text_search:
          enabled: true
          row_id:
            - id
Scheduled View Refresh: Accelerated Views now support cron-based refresh schedules using refresh_cron, automating updates for accelerated data.
Example Spicepod.yml configuration:
views:
  - name: my_view
    sql: SELECT 1
    acceleration:
      enabled: true
      refresh_cron: '0 * * * *' # Every hour
For more details, refer to Scheduled Refreshes.
Multi-column Vector Search: For datasets configured with embeddings on more than one column, POST v1/search and similarity_search perform parallel vector search on each column, aggregating results using reciprocal rank fusion.
Example Spicepod.yml for multi-column search:
datasets:
  - from: github:github.com/apache/datafusion/issues
    name: datafusion.issues
    params:
      github_token: ${secrets:GITHUB_TOKEN}
    columns:
      - name: title
        embeddings:
          - from: hf_minilm
      - name: body
        embeddings:
          - from: openai_embeddings
AWS Bedrock Embeddings Model Provider: Added support for AWS Bedrock embedding models, including Amazon Titan Text Embeddings and Cohere Text Embeddings.
Example Spicepod.yml:
embeddings:
  - from: bedrock:cohere.embed-english-v3
    name: cohere-embeddings
    params:
      aws_region: us-east-1
      input_type: search_document
      truncate: END
  - from: bedrock:amazon.titan-embed-text-v2:0
    name: titan-embeddings
    params:
      aws_region: us-east-1
      dimensions: '256'
For more details, refer to the AWS Bedrock Embedding Models Documentation.
Oracle Data Connector: Use from: oracle: to access and accelerate data stored in Oracle databases, deployed on-premises or in the cloud.
Example Spicepod.yml:
datasets:
  - from: oracle:"SH"."PRODUCTS"
    name: products
    params:
      oracle_host: 127.0.0.1
      oracle_username: scott
      oracle_password: tiger
See the Oracle Data Connector documentation.
GitHub Data Connector: The GitHub data connector supports query and acceleration of members, the users of an organization.
Example Spicepod.yml configuration:
datasets:
  - from: github:github.com/spiceai/members # General format: github.com/[org-name]/members
    name: spiceai.members
    params:
      # With GitHub Apps (recommended)
      github_client_id: ${secrets:GITHUB_SPICEHQ_CLIENT_ID}
      github_private_key: ${secrets:GITHUB_SPICEHQ_PRIVATE_KEY}
      github_installation_id: ${secrets:GITHUB_SPICEHQ_INSTALLATION_ID}
      # With GitHub Tokens
      # github_token: ${secrets:GITHUB_TOKEN}
See the GitHub Data Connector Documentation
Spice.ai Cloud Data Connector: Graduated to Stable.
spice-rs SDK Release: The Spice Rust SDK has updated to v3.0.0. This release includes optimizations for the Spice client API, adds robust query retries, and custom metadata configurations for spice queries.
Contributors
- @Jeadie
- @peasee
- @sgrebnov
- @Sevenannn
- @kczimm
- @phillipleblanc
- @Advayp
- @lukekim
- @ewgenius
- @mach-kernel
- @suhuruli
Breaking Changes
- Search HTTP API Response: POST v1/searchresponse payload has changed. See the new API documentation for details.
- Model Provider Parameter Prefixes: Model Provider parameters use provider-specific prefixes instead of openai_prefixes (e.g.,hf_temperaturefor HuggingFace,anthropic_max_completion_tokensfor Anthropic,perplexity_tool_choicefor Perplexity). Theopenai_prefix remains supported for backward compatibility but is deprecated and will be removed in a future release.
Cookbook Updates
- Added Oracle Data Connector cookbook: Connect to tables in Oracle databases.
- Added Hashed Partitioning with DuckDB cookbook: Accelerate data on large datasets by partitioning data into a fixed number of buckets.
The Spice Cookbook now includes 72 recipes to help you get started with Spice quickly and easily.
Upgrading
To upgrade to v1.5.0, download and install the specific binary from github.com/spiceai/spiceai/releases/tag/v1.5.0 or pull the v1.5.0 Docker image (spiceai/spiceai:1.5.0).
What's Changed
Dependencies
- delta_kernel: Upgraded to v0.12.1
- DuckDB: Upgraded from v1.1.3 to v1.3.2
- iceberg-rust: Upgraded from v0.4.0 to v0.5.1
Changelog
- fix: openai model endpoint (#6394) by @Sevenannn in #6394
- Enable configuring otel endpoint from spice run(#6360) by @Advayp in #6360
- Enable Oracle connector in default build configuration (#6395) by @sgrebnov in #6395
- fix llm integraion test (#6398) by @Sevenannn in #6398
- Promote spice cloud connector to stable quality (#6221) by @Sevenannn in #6221
- v1.5.0-rc.1 release notes (#6397) by @lukekim in #6397
- Fix model nsql integration tests (#6365) by @Sevenannn in #6365
- Fix incorrect UDTF name and SQL query (#6404) by @lukekim in #6404
- Update v1.5.0-rc.1.md (#6407) by @sgrebnov in #6407
- Improve error messages (#6405) by @lukekim in #6405
- build(deps): bump Jimver/cuda-toolkit from 0.2.25 to 0.2.26 (#6388) by @app/dependabot in #6388
- Upgrade dependabot dependencies (#6411) by @phillipleblanc in #6411
- Fix projection pushdown issues for document based file connector (#6362) by @Advayp in #6362
- Add a PartitionedDuckDB Accelerator (#6338) by @kczimm in #6338
- Use vector_search()UDTF in HTTP APIs (#6417) by @Jeadie in #6417
- add supported types (#6409) by @kczimm in #6409
- Enable session time zone override for MySQL (#6426) by @sgrebnov in #6426
- Acceleration-like indexing for full text search indexes. (#6382) by @Jeadie in #6382
- Provide error message when partition by expression changes (#6415) by @kczimm in #6415
- Add support for Oracle Autonomous Database connections (Oracle Cloud) (#6421) by @sgrebnov in #6421
- prune partitions for exact and in list with and without UDFs (#6423) by @kczimm in #6423
- Fixes and reenable FTS tests (#6431) by @Jeadie in #6431
- Upgrade DuckDB to 1.3.2 (#6434) by @phillipleblanc in #6434
- Fix issue in limit clause for the Github Data connector (#6443) by @Advayp in #6443
- Upgrade iceberg-rust to 0.5.1 (#6446) by @phillipleblanc in #6446
- v1.5.0-rc.2 release notes (#6440) by @lukekim in #6440
- Oracle: add automated TPC-H SF1 benchmark tests (#6449) by @sgrebnov in #6449
- fix: Update benchmark snapshots (#6455) by @app/github-actions in #6455
- Preserve ArrowError in arrow_tools::record_batch (#6454) by @mach-kernel in #6454
- fix: Update benchmark snapshots (#6465) by @app/github-actions in #6465
- Add option to preinstall Oracle ODPI-C library in Docker image (#6466) by @sgrebnov in #6466
- Include Oracle connector (federated mode) in automated benchmarks (#6467) by @sgrebnov in #6467
- Update crates/llms/src/bedrock/embed/mod.rs by @lukekim in #6468
- v1.5.0-rc.3 release notes (#6474) by @lukekim in #6474
- Add integration tests for S3 Vectors filters pushdown (#6469) by @sgrebnov in #6469
- check for indexedtableprovider when finding tables to search on (#6478) by @Jeadie in #6478
- Parse fully qualified table names in UDTFs (#6461) by @Jeadie in #6461
- Add integration test for S3 Vectors to cover data update (overwrite) (#6480) by @sgrebnov in #6480
- Add 'Run all tests' option for models tests and enable Bedrock tests (#6481) by @sgrebnov in #6481
- Add support for a memberstable type for the GitHub Data Connector (#6464) by @Advayp in #6464
- S3 vector data cannot be null (#6483) by @Jeadie in #6483
- Don't infer FixedSizeListsize during indexing vectors. (#6487) by @Jeadie in #6487
- Add support for retention_sqlacceleration param (#6488) by @sgrebnov in #6488
- Make dataset refresh progress tracing less verbose (#6489) by @sgrebnov in #6489
- Use RwLockon tantivy index inFullTextDatabaseIndexfor update concurrency (#6490) by @Jeadie in #6490
- Add tests for dataset retention logic and refactor retention code (#6495) by @sgrebnov in #6495
- Upgade dependabot dependencies (#6497) by @phillipleblanc in #6497
- Add periodic tracing of data loading progress during dataset refresh (#6499) by @sgrebnov in #6499
- Promote Oracle Data Connector to Alpha (#6503) by @sgrebnov in #6503
- Use AWS SDK to provide credentials for Iceberg connectors (#6498) by @phillipleblanc in #6498
- Add integration tests for partitioning (#6463) by @kczimm in #6463
- Use top-level table in full-text search JOIN ON(#6491) by @Jeadie in #6491
- Use accelerated table in vector_search JOIN operations when appropriate (#6516) by @Jeadie in #6516
- Fix 'additional_column' for quoted columns (fix for qualified columns broke it) (#6512) by @Jeadie in #6512
- Also use AWS SDK for inferring credentials for S3/Delta/Databricks Delta data connectors (#6504) by @phillipleblanc in #6504
- Add per-dataset availability monitor configuration (#6482) by @phillipleblanc in #6482
- Suppress the warning from the AWS SDK if it can't load credentials (#6533) by @phillipleblanc in #6533
- Change default value of check_availability from default to auto (#6534) by @lukekim in #6534
- README.md improvements for v1.5.0 (#6539) by @lukekim in #6539
- Temporary disable s3_vectors_basic(#6537) by @sgrebnov in #6537
- Ensure binder errors show before query and other (#6374) by @suhuruli in #6374
- Update spiceai/duckdb-rs -> DuckDB 1.3.2 + index fix (#6496) by @mach-kernel in #6496
- Update table-providers to latest version with DuckDB fixes (#6535) by @phillipleblanc in #6535
- S3: default to public access if no auth is provided (#6532) by @sgrebnov in #6532




