Package org.apache.lucene.document.column


package org.apache.lucene.document.column
Column-oriented batch indexing API.

IndexWriter.addBatch(org.apache.lucene.document.column.ColumnBatch) accepts a ColumnBatch: a fixed number of documents presented field-by-field rather than document-by-document. Each field is a Column that exposes its values via cursor iterators rather than concrete IndexableField instances per document.

Column subtypes

  • LongColumn — single- or multi-valued long values for NUMERIC / SORTED_NUMERIC doc values, 1‑D numeric points (int / long / float / double), and stored numeric fields.
  • BinaryColumn — variable-length binary values for BINARY, SORTED, and SORTED_SET doc values, term inversion, multi-dimensional or arbitrary-width points, and stored binary or string fields.
  • DictionaryColumn — pre-defined term dictionary plus per-doc ordinals for SORTED and SORTED_SET doc values, term inversion, and stored binary or string fields.
  • VectorColumn — KNN vectors (FLOAT32 or BYTE encoding); vector-only field type.
  • TokenStreamColumn — caller-supplied TokenStreams for term inversion (the columnar analogue of a custom token stream on a Field); inverted-index-only field type.

Cursors

A Column declares its Column.Density (DENSE or SPARSE) and exposes its values via cursors:

Each call that requests a cursor returns a fresh cursor positioned at the first value, so columns can be consumed multiple times — once in the row-oriented pass for stored fields and term inversion, and again in the column-oriented pass for doc values, points, and vectors.

WARNING: This API is experimental and might change in incompatible ways in the next release.