Using the Elasticsearch Search API with Query DSL for retrieving documents
Searching-with-ElasticSearch-Query-DSL.RmdThe Elasticsearch search API is available in the Dev Tools console in
Kibana or via the API at elastic.bafg.de. Refer to the
extensive documentation
for Query DSL. For the NTSPortal schema, see Structure of
NTSPortal.
Simple search example with the term query
A very simple example is shown to show the structure of a query. This example retrieves all detections from the station “mosel_ko_r”.
Using ntsportal
tb <- ntsportal::getTableByQuery(
tableName = "ntsp25.3_feature*",
searchBlock = list(
query = list(
term = list(
station = "mosel_139"
)
)
)
)getTableByQuery() implements the Python
elasticsearch package in the back-end (via the
ntsportal::DbComm interface). Connecting to the
NTSPortal Elasticsearch Search API from R or Python has more details
regarding using the API in R.
See also ntsportal::getTableByEsql() for a simpler query
language.
Using Python
Using elasticsearch package
from elasticsearch import Elasticsearch
client = Elasticsearch(hosts="https://elastic.bafg.de", api_key= os.environ.get("ELASTICSEARCH_API_KEY"))
response = client.search(
index="ntsp25.2_dbas*",
body= {
"query": {
"exists": {
"field": "name"
}
},
"aggs": {
"all_names": {
"terms": {
"field": "name",
"size": 950
},
"aggs": {
"metadata": {
"top_hits": {
"size": 1,
"_source": ["name", "cas", "inchikey"]
}
}
}
}
},
"size": 0
}
)Using elasticsearch-dsl package
This is the recommended method for programming in Python and is the
basis for the getTable* functions in
ntsportal.
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
client = Elasticsearch(hosts="https://hosturl.mydomain.de", api_key="<secret>")
s = Search(using=client, index="ntsp25.2_dbas*").update_from_dict(
{
"query":{
"term":{
"station": "mosel_139"
}
}
}
)
for hit in s.iterate():
print(hit.to_dict())Note: This example uses elasticsearch 8.17.2 and
elasticsearch-dsl 8.17.1. In newer versions,
elasticsearch-dsl is integrated into
elasticsearch.
Combining search parameters with bool
The search query is usually more complex than in the example above. In this example we search for datafiles with polarity negative, not blanks, at stations ulm or wettin.
GET ntsp_msrawfiles/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"pol": {
"value": "neg"
}
}
}
],
"must_not": [
{
"term": {
"blank": {
"value": "true"
}
}
}
],
"should": [
{
"term": {
"station": {
"value": "donau_ul_m"
}
}
},
{
"term": {
"station": {
"value": "saale_wettin_m"
}
}
}
],
"minimum_should_match": 1
}
}
}Paginating search results, selecting fields and converting to a
data.frame in R.
In this example, we are retrieving all the documents from the measurement station “mosel_ko_r” (Moselle, Koblenz, right bank), but only select certain fields. Using the Dev Tools console, we make the following request:
GET ntsp25.2_dbas*/_search
{
"query": {
"term": {
"station": {
"value": "mosel_ko_r"
}
}
},
"_source": ["name", "inchikey", "pol", "start", "duration",
"area", "area_is", "area_normalized"]
}The response shows that the total number of hits is greater than or equal to (gte) 10000 (max documents reached). Therefore, we can not retrieve all documents with one request and must paginate search results.
The function ntsportal::esSearchPaged() works similarly
to elastic::Search but will paginate the results. The
response list is then transformed to a data.frame. Note:
esSearchPaged() is deprecated in favor of
getTableAsTibble().
connectNtsportal()
res <- esSearchPaged("ntsp25.2_dbas*", searchBody = list(query = list(term = list(station = "mosel_ko_r"))),
source = c("name", "inchikey", "pol", "start", "duration", "area"), sort = "mz")
# Convert the returned list to a data.frame
temp <- lapply(res$hits$hits, function(x) as.data.frame(x[["_source"]]))
df <- plyr::rbind.fill(temp)