Semanctic Search
The LumisXP uses, by default, a keyword search mechanism. The keyword search attempts to extract the roots of relevant words
typed by the user to find content that contains words with the same roots as the searched ones.
On the other hand, the semantic search uses
embeddings
from the texts extracted from the content when registered in the LumisXP CMS and from the words typed by the user when performing a
search to find similarities by semantics.
Below are explained the configurations related to semantic search and how each of them influences both the content indexing phase
and the search phase.
Configurations:
When the global configuration for AI integration is enabled,
LumisXP's search can be done, in addition to keyword searches, using the semantics of the searched words.
For this to be possible, the following fields in the global configuration need to be configured:
- API Key
- AI Model for embedding calculation
- Block size for embedding calculation (characters)
- Enable embedding calculation during indexing for search
In addition to these fields, there is also the field Enable embedding calculation during indexing for private content search, which is not necessary for using the LumisXP semantic search, but influences the behavior of the semantic search. The use of this field will also be explained further below.
These configurations are global for the entire LumisXP cluster.
In addition to the global configurations, there are configurations
at the search service instance level.
These configurations are:
- Number of results obtained in the semantic search
- Weight of semantics in relevance calculation
- Minimum similarity
Below it will be explained how each of these configurations affects LumisXP's semantic search.
Definition of whether AI is enabled for a given service instance:
A search service instance has AI enabled if the following conditions are met:
- The API Key is defined.
- The AI Model for embedding calculation is defined.
- The option Enable embedding calculation during indexing for search is enabled.
-
The service instance does not have a property bag
lumis.portal.ai.enabled
defined (or it is set with the valuetrue
). - The service instance has search enabled.
Definition of public content:
A public content is one that is published for the group
All Users
.
Definition of a public service instance:
A public service instance is one in which the group All Users
has viewing permission.
Content indexing phase:
When creating or updating content in a service instance using the LumisXP CMS (or even when using the tool for
Content Reindexing), the
embeddings
of the content will be generated and stored in the Big Data Repository
if:
- The service instance has AI enabled.
- The content being registered or updated is a public content.
- The service instance is a public service instance.
If the service instance has AI enabled, but the content being registered or updated is not a public content, or the service instance is not a public service instance, the embeddings of the content will be generated and stored in the Big Data Repository if the option Enable embedding calculation during indexing for private content search is enabled.
Search phase:
When a user performs a search, using a search service instance (either via user interfaces — Search
and Search with Results —, or via REST interfaces — search and headlessSearch —), if
the Number of results obtained in semantic search and the
Weight of semantics in relevance calculation are greater than zero,
the results of the keyword search (however many results there are) and the semantic search (limited to
Number of results obtained in semantic search results whose similarity is at least
Minimum similarity) will be combined and the results will have their weight calculated.
The weights of the results will be the sum of the keyword weight and the semantic weight.
The weights of each result will be multiplied by:
- Weight of semantics in relevance calculation, for the semantic weight.
- 1 - Weight of semantics in relevance calculation, for the keyword weight.
Once the results are sorted (assuming that the default sorting is being used — in this case, sorting by relevance), the most relevant results are the first to be displayed in the results.
In other words, if the Weight of semantics in relevance calculation is set to
0
, only the keyword search will be taken into account.
If the Weight of semantics in relevance calculation is set to 1
,
the keyword search will be disregarded and the semantic search will be taken into account.
Intermediate values of Weight of semantics in relevance calculation cause
both searches to be considered.
```