Skip to contents

The Elasticsearch search API is available in the Dev Tools console in Kibana or via the API at elastic.bafg.de. Refer to the extensive documentation for Query DSL. For the NTSPortal schema, see Structure of NTSPortal.

Simple search example with the term query

A very simple example is shown to show the structure of a query. This example retrieves all detections from the station “mosel_ko_r”.

Using the Kibana Dev Tools console

GET ntsp25.1_dbas*/_search
{
  "query": {
    "term": {
      "station": "mosel_139"
    }
  }
}

using curl

curl -X GET "https://<hostname>:<port>/ntsp25.1_dbas*/_search" \
     -H 'Content-Type: application/json' \
     -H 'Authorization: ApiKey <sectret>' \
     -d'
{
  "query": {
    "term": {
      "station": "mosel_139"
    }
  }
}'

Using ntsportal

library(ntsportal)
connectNtsportal()
dbComm <- getDbComm()
tb <- getTableAsTibble(
  dbComm, 
  tableName = "ntsp25.1_dbas*", 
  searchBlock = list(
    query = list(
      term = list(
        station = "mosel_139"
      )
    )
  )
)

getTableAsTibble() implements the Python elasticsearch-dsl package in the back-end.

Using Python

Using elasticsearch package

from elasticsearch import Elasticsearch
client = Elasticsearch(hosts="https://hosturl.mydomain.de", api_key="<secret>")
response = client.search(
  index="ntsp25.1_dbas*", 
  body={
    "query": {
      "term": {
        "station": "mosel_139"
        }
      }
    }
)

Using elasticsearch-dsl package

This is the recommended method for programming in Python and is the basis for the getTable* functions in ntsportal.

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
client = Elasticsearch(hosts="https://hosturl.mydomain.de", api_key="<secret>")
s = Search(using=client, index="ntsp25.1_dbas*").update_from_dict(
  {
    "query":{
      "term":{
        "station": "mosel_139"
      }
    }
  }
)
for hit in s.iterate():
  print(hit.to_dict())

Note: This example uses elasticsearch 8.17.2 and elasticsearch-dsl 8.17.1. In newer versions, elasticsearch-dsl is integrated into elasticsearch.

Combining search parameters with bool

The search query is usually more complex than in the example above. In this example we search for datafiles with polarity negative, not blanks, at stations ulm or wettin.

GET ntsp_msrawfiles/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "pol": {
              "value": "neg"
            }
          }
        }
      ],
      "must_not": [
        {
          "term": {
            "blank": {
              "value": "true"
            }
          }
        }
      ],
      "should": [
        {
          "term": {
            "station": {
              "value": "donau_ul_m"
            }
          }
        },
        {
          "term": {
            "station": {
              "value": "saale_wettin_m"
            }
          }
        }
      ], 
      "minimum_should_match": 1
    }
  }
}

Paginating search results, selecting fields and converting to a data.frame in R.

In this example, we are retrieving all the documents from the measurement station “mosel_ko_r” (Moselle, Koblenz, right bank), but only select certain fields. Using the Dev Tools console, we make the following request:

GET ntsp25.1_dbas*/_search
{
  "query": {
    "term": {
      "station": {
        "value": "mosel_ko_r"
      }
    }
  },
  "_source": ["name", "inchikey", "pol", "start", "duration", 
              "area", "area_is", "area_normalized"]
}

The response shows that the total number of hits is greater than or equal to (gte) 10000 (max documents reached). Therefore, we can not retrieve all documents with one request and must paginate search results.

The function ntsportal::esSearchPaged() works similarly to elastic::Search but will paginate the results. The response list is then transformed to a data.frame. Note: esSearchPaged() is deprecated in favor of getTableAsTibble().

connectNtsportal()
res <- esSearchPaged("ntsp25.1_dbas*", searchBody = list(query = list(term = list(station = "mosel_ko_r"))), 
   source = c("name", "inchikey", "pol", "start", "duration", "area"), sort = "mz")
 
# Convert the returned list to a data.frame
temp <- lapply(res$hits$hits, function(x) as.data.frame(x[["_source"]]))
df <- plyr::rbind.fill(temp)