JavaDoc Search Specification (JDK 24)

This document specifies the behaviour of the JavaDoc search feature for JDK 24.

Overview

The JavaDoc Search feature was introduced in JDK 9 with JEP 225. The initial JEP did not include a fine-grained specification of the search algorithm, and the algorithm has evolved considerably since the initial implementation. The purpose of this document os to provide an up-to-date specification of the search algorithm in documentation generated by the JavaDoc standard doclet.

Definitions

In this document, the term entity is used to describe an artifact in documented code that is discoverable through the JavaDoc Search feature. This includes program elements and other items defined in documentation comments.

The term signature is used to describe the exact text used to represent an entity in JavaDoc Search.

The terms separator, identifier, name, simple name, qualified name, and fully qualified name are used as defined in the Java Language Specification sections 3.8, 3.11, 6.2, and 6.7. This implies the use of Unicode for the encoding and processing of entity signatures.

The terms letter, uppercase letter, lowercase letter, digit, and white space refer to the "Letter", "Uppercase_Letter", "Lowercase_Letter", "Decimal_Number", and "Space_Separator" general categories of the Unicode standard.

The term camel-case is used to describe mixed-case identifiers that make use of uppercase letters to mark word boundaries within the identifier.

The term query string is used to describe the characters entered in the search input box by the user.

Examples in the following sections refer to or are taken from the standard Java SE class libraries.

Categories of Searchable Entities

The following sections list the kinds of program elements and other entities covered by JavaDoc Search and the format of their signatures.

Modules

The signature of a named module is the module name.

Example signature:

Packages

If a package is in a named module, the signature of the package is the name of the module, followed by '/', followed by the fully qualified name of the package; if a package is not in a named module, the signature is the fully qualified name of the package.

Example signature:

Types

The signature of a class or interface type is its fully qualified type name.

Example signatures:

Members

The signature of a member is the fully qualified name of its containing type, followed by '.', followed by the simple name of the member, followed by a list of parameter types if the member is a constructor or method. The list of parameter types is '(', followed by the simple names of formal parameter types of the constructor or method separated by ', ', followed by ')'.

Example signatures:

JavaDoc Tags

Various JavaDoc tags can be used to create searchable entities.

The signature for each of these tags is a string provided in the tag.

Example signatures and tags:

Search Rules

The following sections describe the rules used to search entity signatures for a given query string.

Although the purpose of this document is not to describe the implementation of the Search feature, an understanding of the sequence of actions is helpful in understanding rules applied. A search involves the following steps:

  1. The query string is parsed and compiled into a pattern.
  2. The pattern is matched against the signatures of all searchable entities.
  3. Entities with matching signatures are filtered and scored according to the rules described in the sections below.
  4. Entities with a score exceeding a certain threshold are presented to the user ordered by their score.

The scoring mechanism in step 3 is subtractive: it starts with a high score for all matching entities and diminishes the score to rank entities lower or exclude them from the results.

Case Sensitivity

The search pattern is matched against signatures in a case-insensitive manner. If the query string contains uppercase letters, signatures with matching capitalization are scored higher than ones with non-matching capitalization. Additionally, query strings containing uppercase letters cause the rules in the Camel-Case Matches section to be applied.

Examples of Query Strings and Matches
Query String Matches
Object type java.lang.Object
object type java.lang.Object
obJECT type java.lang.Object
MAX_VALUE member java.lang.Byte.MAX_VALUE
max_value member java.lang.Byte.MAX_VALUE
max_VALUE member java.lang.Byte.MAX_VALUE

Word Boundaries

Word boundaries play an important role in determining the score of matching entities. The following are considered word boundaries in entity signatures:

Left Word Boundaries

The beginning of a match in an entity's signature must be a left word boundary, or a separator preceding a left word boundary, in order for the entity to be included in the search results.

Examples of Query Strings and Matches
Query String Matches Does not match
base module java.base type java.sql.DatabaseMetaData
.util package java.util type javax.swing.SwingUtilities
map types java.util.Map, java.util.HashMap type javax.swing.text.Keymap
.map type java.util.Map types java.util.HashMap, javax.swing.text.Keymap
val member java.lang.Byte.MAX_VALUE type java.nio.InvalidMarkException
32 type java.util.zip.Adler32 @spec tag for RFC 1323

Matches may be scored differently depending on the type of the left word boundary they begin at. For example, a match starting at the beginning of an identifier may be scored higher than one starting in the middle of a camel-case identifier.

Example: - Query string set matches types java.util.Set and java.util.HashSet but the former is ranked higher than the latter.

Right Word Boundaries

The end of the query string is not required to match a right word boundary in an entity's signature in order for the entity to be included in the search results.

Examples of Query Strings and Matches
Query String Matches
Obj type java.lang.Object
j.l.o type java.lang.Object

However, matches that include a right word boundary are scored higher than matches which do not (and therefore only match part of an identifier, name or word).

Example:

Camel-Case Matches

Since uppercase letters followed by lowercase letters or digits are also considered word boundaries, the rules for left and right word boundaries also apply to camel-case signatures.

In addition, when searching for camel-case signatures, some or all of the lowercase letters or digits between the uppercase characters can be omitted from the query string.

Examples of Query Strings and Matches
Query String Matches
FileInStr type java.io.FileInputStream
FIS type java.io.FileInputStream
j.io.FileInpS type java.io.FileInputStream
FileInStr(FiD member java.io.FileInputStream.FileInputStream(FileDescriptor)
FInpS(FD member java.io.FileInputStream.FileInputStream(FileDescriptor)
FINPS(FD no match as not all uppercase letters match a camel-case name

White Space and Multiple Search Terms

White space in the query string is significant if it occurs between non-space characters. Regions of non-space characters in the query string separated by white space are considered search terms. The following rules apply for query string with multiple search terms:

The number of white space characters between search terms is not significant. Leading and trailing white space in the query string is never significant. A query string must contain at least one non-space character to trigger a search.

Examples of Query Strings and Matches
Query String Matches
string append long methods java.lang.StringBuffer.append(long) and java.lang.StringBuilder.append(long)
obj eq o o methods java.util.Objects.equals(Object, Object) and java.util.Objects.deepEquals(Object, Object)
java frame type java.awt.Frame and search tag Java Collections Framework
ob ject no match as ject does not match a left word boundary in Java SE

Core Signature Regions

For some kinds of program elements, additional filtering rules are applied depending on the location of the match within the signature. This is done to prioritize matches in parts of the signature that are particular to the program element over matches in less significant areas.

The part of a signature considered most significant for a program element is called the core region. The table below lists the program elements for which this mechanism is applied, and the core regions of their signatures.

Program Element Core signature region
Package Fully qualified package name
Type Simple type name
Member Simple member name

With the exceptions documented below, a match is excluded from the search results unless it covers at least part of the core signature region. If the query string has multiple search terms, at least one of them must match a part of the core signature region.

Examples of Query Strings and Matches
Query String Matches
java.base module java.base but not any packages contained therein
java.lang package java.lang and its subpackages but not type contained therein
java.util.Map type java.util.Map but not type java.util.Map.Entry

While formal parameter types of executable members are not part of the core signature region, omitting the member name and starting the query string with( bypasses core region filtering to allow searching for executable members with specific parameter types.

Examples of Query Strings and Matches
Query String Matches
int type java.lang.Integer but not method java.lang.String.valueOf(int)
(int methods and constructors with int as first parameter type

Listing Child Elements

Any query string that matches a program element can be turned into a query string that matches all its child elements by appending the separator character used in the signatures of child elements. This bypasses core region filtering for contained elements to include them in the search results.

Examples of Query Strings and Matches
Query String Matches
j.b module java.base but not the packages contained therein
j.b/ packages contained in module java.base
java.lang package java.lang and its subpackages, but not types contained therein
java.lang. types in and subpackages of package java.lang
system type java.lang.System but not its members
system. members of type java.lang.System

Supported Browsers

The search feature is supported in the following browsers.

Supported Browsers
Browser Version Platform
Apple Safari tbd MacOS
Google Chrome tbd All supported OSs
Microsoft Edge tbd Windows OSs
Mozilla Firefox tbd All supported OSs