Cassandra Documentation

Version:

CREATE CUSTOM INDEX

Cassandra 5.0 is the only supported database currently.

Creates a Storage-Attached Indexing (SAI) index. You can create multiple secondary indexes on the same database table, with each SAI index based on any column in the table. All column date types except the following are supported for SAI indexes:

  • counter

  • non-frozen user-defined type (UDT)

One exception

You cannot define an SAI index based on the partition key when it’s comprised of only one column. If you attempt to create an SAI index in this case, SAI issues an error message.

However,you can define an SAI index on one of the columns in a table’s composite partition key, i.e., a partition key comprised of multiple columns. If you need to query based on one of those columns, an SAI index is a helpful option. In fact, you can define an SAI index on each column in a composite partition key, if needed.

Defining one or more SAI indexes based on any column in a database table (with the rules noted above) subsequently gives you the ability to run performant queries that use the indexed column to filter results.

See the SAI section.

Syntax

BNF definition:

index_name::= re('[a-zA-Z_0-9]+')
CREATE CUSTOM INDEX [ IF NOT EXISTS ] [ <index_name> ]
  ON [ <keyspace_name>.]<table_name> (<column_name>)
    | [ (KEYS(<map_name>)) ]
    | [ (VALUES(<map_name>)) ]
    | [ (ENTRIES(<map_name>)) ]
  USING 'StorageAttachedIndex'
  [ WITH OPTIONS = { <option_map> } ] ;
Syntax legend
Legend
Syntax conventions Description

UPPERCASE

Literal keyword.

Lowercase

Not literal.

< >

Variable value. Replace with a user-defined value.

[]

Optional. Square brackets ([]) surround optional command arguments. Do not type the square brackets.

( )

Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

|

Or. A vertical bar (|) separates alternative elements. Type any one of the elements. Do not type the vertical bar.

...

Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as required.

'<Literal string>'

Single quotation (') marks must surround literal strings in CQL statements. Use single quotation marks to preserve upper case.

{ <key> : <value> }

Map collection. Braces ({ }) enclose map collections or key value pairs. A colon separates the key and the value.

<datatype2

Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple. Separate the data types with a comma.

<cql_statement>;

End CQL statement. A semicolon (;) terminates all CQL statements.

[--]

Separate the command line options from the command arguments with two hyphens ( -- ). This syntax is useful when arguments might be mistaken for command line options.

' <<schema\> ... </schema\>> '

Search CQL only: Single quotation marks (') surround an entire XML schema declaration.

@<xml_entity>='<xml_entity_type>'

Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema and solrConfig files.

index_name

Optional identifier for index. If no name is specified, the default used is <table_name>\_<column_name>\_idx. Enclose in quotes to use special characters or to preserve capitalization.

column_name

The name of the table column on which the SAI index is being defined. SAI allows only alphanumeric characters and underscores in names. SAI returns InvalidRequestException if you try to define an index on a column name that contains other characters, and does not create the index.

map_name

Used with collections, identifier of the map_name specified in CREATE TABLE …​ map(<map_name>). The regular column syntax applies for collection types list and set.

option_map

Define options in JSON simple format.

Option Description

case_sensitive

Ignore case in matching string values. Default: true.

normalize

When set to true, perform Unicode normalization on indexed strings. SAI supports Normalization Form C (NFC) Unicode. When set to true, SAI normalizes the different versions of a given Unicode character to a single version, retaining all the marks and symbols in the index. For example, SAI would change the character Å (U+212B) to Å (U+00C5).

When implementations keep strings in a normalized form, equivalent strings have a unique binary representation. See Unicode Standard Annex #15, Unicode Normalization Forms.

Default: false.

ascii

When set to true, SAI converts alphabetic, numeric, and symbolic characters that are not in the Basic Latin Unicode block (the first 127 ASCII characters) to the ASCII equivalent, if one exists. For example, this option changes à to a. Default: false.

similarity_function

Vector search relies on computing the similarity or distance between vectors to identify relevant matches. The similarity function is used to compute the similarity between two vectors. Valid options are: EUCLIDEAN, DOT_PRODUCT, COSINE Default: COSINE

Query operators

SAI supports the following query operators for tables with SAI indexes:

  • Numerics: =, <, >, , >=, AND, OR, IN

  • Strings: =, CONTAINS, CONTAINS KEY, AND, OR, IN

SAI does not supports the following query operators for tables with SAI indexes:

  • Strings or Numerics: LIKE

Examples

These examples define SAI indexes for the cycling.cyclist_semi_pro table, which is demonstrated in the SAI quickstart.

CREATE INDEX lastname_sai_idx ON cycling.cyclist_semi_pro (lastname)
USING 'sai'
WITH OPTIONS = {'case_sensitive': 'false', 'normalize': 'true', 'ascii': 'true'};

CREATE INDEX age_sai_idx ON cycling.cyclist_semi_pro (age)
USING 'sai';

CREATE INDEX country_sai_idx ON cycling.cyclist_semi_pro (country)
USING 'sai'
WITH OPTIONS = {'case_sensitive': 'false', 'normalize': 'true', 'ascii': 'true'};

CREATE INDEX registration_sai_idx ON cycling.cyclist_semi_pro (registration)
USING 'sai';

For sample queries that find data in cycling.cyclist_semi_pro via these sample SAI indexes, see Submit CQL queries. Also refer Querying with SAI.

SAI collection map examples with keys, values, and entries

The following examples demonstrate using collection maps of multiple types (keys, values, entries) in SAI indexes. For related information, see Creating collections and Using map type.

Also refer to the SAI collection examples of type list and set in this topic.

First, create the keyspace:

CREATE KEYSPACE demo3 WITH REPLICATION =
       {'class': 'SimpleStrategy', 'replication_factor': '1'};

Next, use the keyspace:

USE demo3;

Create an audit table, with a collection map named text_map:

CREATE TABLE audit ( id int PRIMARY KEY , text_map map<text, text>);

Create multiple SAI indexes on the same map column, each using KEYS, VALUES, and ENTRIES.

CREATE CUSTOM INDEX ON audit (KEYS(text_map)) USING 'StorageAttachedIndex';
CREATE CUSTOM INDEX ON audit (VALUES(text_map)) USING 'StorageAttachedIndex';
CREATE CUSTOM INDEX ON audit (ENTRIES(text_map)) USING 'StorageAttachedIndex';

Insert some data:

INSERT INTO audit (id, text_map) values (1, {'Carlos':'Perotti', 'Marcel':'Silva'});
INSERT INTO audit (id, text_map) values (2, {'Giovani':'Pasi', 'Frances':'Giardello'});
INSERT INTO audit (id, text_map) values (3, {'Mark':'Pastore', 'Irene':'Cantona'});

Query all data:

  • Query

  • Result

SELECT * FROM audit;
 id | text_map
----+---------------------------------------------
  1 | {'Carlos': 'Perotti', 'Marcel': 'Silva'}
  2 | {'Frances': 'Giardello', 'Giovani': 'Pasi'}
  3 | {'Irene': 'Cantona', 'Mark': 'Pastore'}

(3 rows)

Query using the SAI index for specific entries in the map column:

  • Query

  • Result

SELECT * FROM audit WHERE text_map['Irene'] = 'Cantona' AND text_map['Mark'] = 'Pastore';
 id | text_map
----+-----------------------------------------
  3 | {'Irene': 'Cantona', 'Mark': 'Pastore'}

(1 rows)

Query using the SAI index for specific keys in the map column using CONTAINS KEY:

  • Query

  • Result

SELECT * FROM audit WHERE text_map CONTAINS KEY 'Giovani';
 id | text_map
----+---------------------------------------------
  2 | {'Frances': 'Giardello', 'Giovani': 'Pasi'}

(1 rows)

Query using the SAI index for specific values in the map column with CONTAINS:

  • Query

  • Result

SELECT * FROM audit WHERE text_map CONTAINS 'Silva';
 id | text_map
----+------------------------------------------
  1 | {'Carlos': 'Perotti', 'Marcel': 'Silva'}

(1 rows)

Remember that in CQL queries using SAI indexes, the CONTAINS clauses are supported with, and specific to:

  • SAI collection maps with keys, values, and entries

  • SAI collections with list and set types

SAI collection examples with list and set types

These examples demonstrate using collections with the list and set types in SAI indexes. For related information, see:

If you have not already, create the keyspace.

CREATE KEYSPACE IF NOT EXISTS demo3 WITH REPLICATION =
       {'class': 'SimpleStrategy', 'replication_factor': '1'};
USE demo3;

Using the list type

Create a calendar table with a collection of type list.

CREATE TABLE calendar (key int PRIMARY KEY, years list<int>);

Create an SAI index using the collection’s years column.

CREATE CUSTOM INDEX ON calendar(years) USING 'StorageAttachedIndex';

Insert some random int list data for years, just for demo purposes.

Notice the INSERT command’s square brackets syntax for list values.

INSERT INTO calendar (key, years) VALUES (0, [1990,1996]);
INSERT INTO calendar (key, years) VALUES (1, [2000,2010]);
INSERT INTO calendar (key, years) VALUES (2, [2001,1990]);

Query with CONTAINS example:

  • Query

  • Result

SELECT * FROM calendar WHERE years CONTAINS 1990;
 key | years
-----+--------------
   0 | [1990, 1996]
   2 | [2001, 1990]
(2 rows)

This example created the calendar table with years list<int>. Of course, you could have created the table with years list<text>, for example, inserted 'string' values, and queried on the strings.

Using the set type

Now create a calendar2 table with a collection of type set.

CREATE TABLE calendar2 (key int PRIMARY KEY, years set<int>);

Create an SAI index using the collection’s years column — this time for the calendar2 table.

CREATE CUSTOM INDEX ON calendar2(years) USING 'StorageAttachedIndex';

Insert some random int set data for years, again just for demo purposes.

Notice the INSERT command’s curly braces syntax for set values.

INSERT INTO calendar2 (key, years) VALUES (0, {1990,1996});
INSERT INTO calendar2 (key, years) VALUES (1, {2000,2010});
INSERT INTO calendar2 (key, years) VALUES (2, {2001,1990,2020});

Query with CONTAINS example from the list:

  • Query

  • Result

SELECT * FROM calendar2  WHERE years CONTAINS 1990;
 key | years
-----+--------------------
   0 |       {1990, 1996}
   2 | {1990, 2001, 2020}

(2 rows)