Class DictionaryColumn
java.lang.Object
org.apache.lucene.document.column.Column
org.apache.lucene.document.column.DictionaryColumn
A
Column that provides string or binary values via a pre-defined term dictionary plus
per-doc ordinals into that dictionary. Used for SORTED and SORTED_SET doc values, for stored binary or
string fields, and for term inversion (tokenized or untokenized).
Iteration is performed via cursors. tuples() is always available and yields
(docID, ordinal) pairs. values() is a bulk cursor over consecutive doc-ids; it must be
overridden when Column.density() is DENSE and is only consulted in
that case.
The caller supplies a fixed List<BytesRef> dictionary at construction. Per-doc
ordinals returned by cursors index into this dictionary.
Duplicate dictionary entries are permitted; two slots with the same bytes will both resolve to the same Lucene-level ordinal. The dictionary may be in any order.
The dictionary list and the backing byte arrays of its entries must not be mutated after the column is constructed.
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.document.column.Column
Column.Density -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotectedDictionaryColumn(String name, IndexableFieldType fieldType, Column.Density density, List<BytesRef> dictionary) Creates a DictionaryColumn. -
Method Summary
Modifier and TypeMethodDescriptionReturns the term dictionary.The stored-field type emitted for this column.abstract OrdinalsTupleCursortuples()Returns a fresh tuple cursor starting at the beginning of the batch.values()Returns a fresh dense cursor for doc-ids[0, numDocs), producing exactly one ordinal per doc.
-
Constructor Details
-
DictionaryColumn
protected DictionaryColumn(String name, IndexableFieldType fieldType, Column.Density density, List<BytesRef> dictionary) Creates a DictionaryColumn.- Parameters:
name- the field namefieldType- describes how this field should be indexeddensity- whether every batch-local doc-id has a valuedictionary- the term universe; entries must be non-null and no longer thanByteBlockPool.BYTE_BLOCK_SIZE - 2. Must contain at least one entry. Duplicate entries are allowed but incur a minor per-batch cost.
-
-
Method Details
-
dictionary
Returns the term dictionary. The list is indexed by ordinal; cursors must produce values in[0, dictionary().size()). -
tuples
Returns a fresh tuple cursor starting at the beginning of the batch. Always available, regardless ofColumn.density(). -
values
Returns a fresh dense cursor for doc-ids[0, numDocs), producing exactly one ordinal per doc. Must be overridden whenColumn.density()isDENSE; the default implementation throwsUnsupportedOperationExceptionand is never called forSPARSEcolumns. -
storedType
The stored-field type emitted for this column. The default isStoredValue.Type.BINARY. OnlyStoredValue.Type.BINARYandStoredValue.Type.STRINGare supported; subclasses may override to emit string stored values.
-