Class TemporalMergePolicy

java.lang.Object
org.apache.lucene.index.MergePolicy
org.apache.lucene.index.TemporalMergePolicy

public class TemporalMergePolicy extends MergePolicy
A merge policy that groups segments by time windows and merges segments within the same window, This policy is designed for time-series data where documents contain a timestamp field indexed as a LongPoint.

This policy organizes segments into time buckets based on the maximum timestamp in each segment. Recent data goes into small time windows (e.g., 1 hour), while older data is grouped into exponentially larger windows (e.g., 4 hours, 16 hours, etc.). Segments within the same time window are merged together when they meet the configured thresholds, but segments from different time windows are never merged together, preserving temporal locality.

When to use this policy:

  • Time-series data where queries typically filter by time ranges
  • Data with a timestamp field that can be used for bucketing
  • Workloads where older data is queried less frequently than recent data
  • Use cases where you want to avoid mixing old and new data in the same segment

Configuration:

 TemporalMergePolicy policy = new TemporalMergePolicy()
     .setTemporalField("timestamp")           // Required: name of the timestamp field
     .setBaseTimeSeconds(3600)                // Base window size: 1 hour
     .setMinThreshold(4)                      // Merge when 4+ segments in a window
     .setMaxThreshold(8)                      // Merge at most 8 segments at once
     .setCompactionRatio(1.2);                // Size ratio threshold for merging
     // By default, exponential buckets are enabled. Use .disableExponentialBuckets() to disable.

 IndexWriterConfig config = new IndexWriterConfig(analyzer);
 config.setMergePolicy(policy);
 

Time bucketing: By default, window sizes grow exponentially: baseTime, baseTime * minThreshold, baseTime * minThreshold^2, etc. This ensures that recent data is in small, frequently-merged windows while older data is in larger, less-frequently-merged windows. Call disableExponentialBuckets() to use fixed-size windows instead, where all windows have the same size (baseTime).

Compaction ratio: The setCompactionRatio(double) parameter controls when merges are triggered. A merge is considered when the total document count across candidate segments exceeds largestSegment * compactionRatio. Lower values (e.g., 1.2) trigger merges more aggressively, while higher values (e.g., 2.0) allow more segments to accumulate before merging. Set to 1.0 for most aggressive merging.

NOTE: This policy requires a timestamp field indexed as a LongPoint. The timestamp can be in seconds, milliseconds, or microseconds (auto-detected based on value magnitude).

NOTE: Segments from different time windows are never merged together, even during IndexWriter.forceMerge(int). If you call forceMerge(1) but have segments in multiple time windows, you will end up with one segment per time window.

NOTE: Very old segments (older than setMaxAgeSeconds(long)) are not merged to avoid unnecessary I/O on cold data.

WARNING: This API is experimental and might change in incompatible ways in the next release.
  • Constructor Details

    • TemporalMergePolicy

      public TemporalMergePolicy()
      Sole constructor, setting all settings to their defaults.
  • Method Details

    • setTemporalField

      public TemporalMergePolicy setTemporalField(String temporalField)
      Sets the name of the timestamp field used for temporal bucketing. This field must be indexed as a LongPoint and contain timestamp values in seconds, milliseconds, or microseconds (auto-detected based on value magnitude).

      This parameter is required and must be set before the policy can schedule any merges. The merge policy will extract the minimum and maximum timestamps from each segment to determine which time window the segment belongs to.

      Default is empty (no temporal field configured, policy is inactive).

    • getTemporalField

      public String getTemporalField()
      Returns the current temporal field name.
      See Also:
    • setBaseTimeInSeconds

      public TemporalMergePolicy setBaseTimeInSeconds(long baseTimeInSeconds)
      Sets the base time window size in seconds. This determines the size of the smallest (most recent) time buckets.

      By default, window sizes grow exponentially: baseTime, baseTime * minThreshold, baseTime * minThreshold^2, etc. If you call disableExponentialBuckets(), all windows will have the same size equal to baseTime.

      Smaller values create finer-grained time windows, which can improve query performance for time-range queries but may result in more segments. Larger values reduce the number of time windows but may mix data from a wider time range in the same segment.

      Default is 3600 seconds (1 hour).

    • getBaseTimeInSeconds

      public long getBaseTimeInSeconds()
      Returns the current base time window size in seconds.
      See Also:
    • setMinThreshold

      public TemporalMergePolicy setMinThreshold(int minThreshold)
      Sets the minimum number of segments required in a time window to trigger a merge. Higher values reduce merge frequency and I/O but allow more segments to accumulate. Lower values keep segment counts lower but increase write amplification.

      This threshold is also used as the growth factor for exponential bucketing (which is enabled by default). For example, with minThreshold=4, window sizes will be: baseTime, baseTime * 4, baseTime * 16, etc.

      Must be at least 2 and cannot exceed setMaxThreshold(int). Default is 4.

    • getMinThreshold

      public int getMinThreshold()
      Returns the current minimum threshold for merging.
      See Also:
    • disableExponentialBuckets

      public TemporalMergePolicy disableExponentialBuckets()
      Disables exponentially growing time windows. By default, older data is grouped into progressively larger time buckets: baseTime, baseTime * minThreshold, baseTime * minThreshold^2, etc.

      Calling this method changes the behavior so that all time windows have a fixed size equal to baseTime, which can be useful for workloads with uniform query patterns across all time ranges.

      Exponential bucketing (the default) is recommended for typical time-series use cases where recent data is accessed more frequently than older data.

    • getUseExponentialBuckets

      public boolean getUseExponentialBuckets()
      Returns whether exponential bucketing is enabled.
      See Also:
    • setMaxThreshold

      public TemporalMergePolicy setMaxThreshold(int maxThreshold)
      Sets the maximum number of segments to merge at once within a time window. Larger values allow more aggressive merging (reducing segment count faster) but increase the cost of individual merge operations.

      Must be at least equal to setMinThreshold(int). When a time window accumulates more segments than this threshold, the policy will schedule multiple smaller merges rather than one large merge.

      Default is 8.

    • getMaxThreshold

      public int getMaxThreshold()
      Returns the current maximum threshold for merging.
      See Also:
    • setCompactionRatio

      public TemporalMergePolicy setCompactionRatio(double compactionRatio)
      Sets the compaction ratio that controls when merges are triggered based on segment size distribution. A merge is considered when the total document count of candidate segments exceeds largestSegment * compactionRatio.

      Lower values (e.g., 1.2) trigger merges more aggressively, even when segment sizes are relatively balanced. Higher values (e.g., 2.0 or higher) wait for more size imbalance before merging, allowing more segments to accumulate but reducing write amplification.

      Setting this to exactly 1.0 enables the most aggressive merging mode, where merges occur whenever the minimum threshold is met, regardless of segment size distribution.

      This parameter works together with setMinThreshold(int): a time window must have both (1) at least minThreshold segments, and (2) satisfy the compaction ratio, before a merge is triggered.

      Default is 1.2.

    • getCompactionRatio

      public double getCompactionRatio()
      Returns the current compaction ratio.
      See Also:
    • setMaxWindowSizeSeconds

      public TemporalMergePolicy setMaxWindowSizeSeconds(long maxWindowSizeSeconds)
      Sets the maximum size for exponentially growing time windows. When exponential bucketing is enabled (the default), window sizes grow exponentially but are capped at this value.

      This prevents extremely large time windows for very old data, which could mix data from vastly different time periods. Once window size reaches this limit, all older data uses fixed-size windows of this duration.

      Default is 31536000 seconds (365 days).

    • getMaxWindowSizeSeconds

      public long getMaxWindowSizeSeconds()
      Returns the current maximum window size in seconds.
      See Also:
    • setMaxAgeSeconds

      public TemporalMergePolicy setMaxAgeSeconds(long maxAgeSeconds)
      Sets the maximum age threshold for merging segments. Segments containing data older than this threshold (based on current time minus the segment's maximum timestamp) will not be merged.

      This is useful for preventing unnecessary I/O on cold, historical data that is rarely queried. These old segments are placed in a special "old data" bucket and skipped during merge selection.

      Default is Long.MAX_VALUE (no age limit, all segments are merge candidates).

    • getMaxAgeSeconds

      public long getMaxAgeSeconds()
      Returns the current maximum age threshold in seconds.
      See Also:
    • setForceMergeDeletesPctAllowed

      public TemporalMergePolicy setForceMergeDeletesPctAllowed(double pct)
      When IndexWriter.forceMergeDeletes() is called, only merge segments whose delete percentage exceeds this threshold. Lower values merge more aggressively to reclaim space from deleted documents, but increase I/O and write amplification.

      The delete percentage is calculated as: (deleted docs / total docs) * 100.

      Default is 10.0 (merge segments with more than 10% deleted documents).

    • getForceMergeDeletesPctAllowed

      public double getForceMergeDeletesPctAllowed()
      Returns the current force merge deletes percentage threshold.
      See Also:
    • findMerges

      public MergePolicy.MergeSpecification findMerges(MergeTrigger trigger, SegmentInfos segments, MergePolicy.MergeContext context) throws IOException
      Description copied from class: MergePolicy
      Determine what set of merge operations are now necessary on the index. IndexWriter calls this whenever there is a change to the segments. This call is always synchronized on the IndexWriter instance so only one thread at a time will call this method.
      Specified by:
      findMerges in class MergePolicy
      Parameters:
      trigger - the event that triggered the merge
      segments - the total set of segments in the index
      context - the IndexWriter to find the merges on
      Throws:
      IOException
    • findForcedMerges

      public MergePolicy.MergeSpecification findForcedMerges(SegmentInfos segmentInfos, int maxSegmentCount, Map<SegmentCommitInfo,Boolean> segmentsToMerge, MergePolicy.MergeContext mergeContext) throws IOException
      Description copied from class: MergePolicy
      Determine what set of merge operations is necessary in order to merge to <= the specified segment count. IndexWriter calls this when its IndexWriter.forceMerge(int) method is called. This call is always synchronized on the IndexWriter instance so only one thread at a time will call this method.
      Specified by:
      findForcedMerges in class MergePolicy
      Parameters:
      segmentInfos - the total set of segments in the index
      maxSegmentCount - requested maximum number of segments in the index
      segmentsToMerge - contains the specific SegmentInfo instances that must be merged away. This may be a subset of all SegmentInfos. If the value is True for a given SegmentInfo, that means this segment was an original segment present in the to-be-merged index; else, it was a segment produced by a cascaded merge.
      mergeContext - the MergeContext to find the merges on
      Throws:
      IOException
    • findForcedDeletesMerges

      public MergePolicy.MergeSpecification findForcedDeletesMerges(SegmentInfos segmentInfos, MergePolicy.MergeContext mergeContext) throws IOException
      Description copied from class: MergePolicy
      Determine what set of merge operations is necessary in order to expunge all deletes from the index.
      Specified by:
      findForcedDeletesMerges in class MergePolicy
      Parameters:
      segmentInfos - the total set of segments in the index
      mergeContext - the MergeContext to find the merges on
      Throws:
      IOException
    • toString

      public String toString()
      Overrides:
      toString in class Object