Quickwit on Kubernetes
Installation guide 🦮
Prerequisites
- Access to a Kubernetes cluster (you can easily create a local cluster by using Minikube or Kind)
- kubectl isn't strictly speaking a dependency for installing packages via glasskube, but it is the recommended way to interact with the cluster. Therefore, it is highly recommended. Installation instructions are available for macOS, Linux and Windows.
Install Glasskube
If you've already installed glasskube you can skip this step. If not, glasskube can easily be installed by following your distribution's specific instructions.
For this demo I'll be using a MacOs distribution:
brew install glasskube/tap/glasskube # install the glasskube cli
minikube start # start a minikube Kubernetes cluster
glasskube bootstrap # install glasskube on the kind cluster
For more installation guides, find them here.
Once Glasskube has been installed access via the UI with:
glasskube serve
The dashboard will open up on http://localhost:8580/
.
Creating an S3-Compatible Bucket
Before installing Quickwit, you'll need to create an object storage bucket to hold your Quickwit indexes
. You can use use your choice of Cloud provider such as Scaleway, AWS S3 or MinIO. Refer to our official Quickwit documentation for storage configuration details.
Here I will be creating an AWS S3 bucket
to store the Quickwit indexes.
Steps:
- Navigate to the AWS management console and create a new S3 bucket.
- In IAM generate an API key, with S3 permissions, save the 'Access Key Id' and 'Secret Key', we will need them shortly.
Deploy Quickwit
From the Glasskube dashboard, find the Quickwit
pacakge and add your custom configuration parameters.
- defaultIndexRootUri: for this demo it's
s3://quickwit-indexes
. - metastoreUri: we won't use PostgreSQL so let's pick the same value we used for
defaultIndexRootUri
. - s3AccessKeyId: the
"Access Key Id"
from AWS we generated before. - s3Endpoint: Custom endpoint for use with S3-compatible providers. Not needed for S3 configuration.
- s3Flavor: we are using the default
empty value
for genuine S3-compatible object storage. - s3Region:
US-east-1
in my case. - s3SecretAccessKey: the
"Secret Key"
from AWS we generated before.
Here you can find the official Quickwit documentation for parameter completion.
It's also possible to install and configure Quickwit using the Glasskube CLI by running:
glasskube install quickwit
Once installed, you can see that a quickwit
namespace has been created:
default
flux-system
glasskube-system
kube-node-lease
kube-public
kube-system
kubernetes-dashboard
quickwit
Now, check to see if the pods are running:
NAME READY STATUS RESTARTS AGE
quickwit-quickwit-control-plane-86bd9955f7-bwm2r 1/1 Running 1 (27m ago) 29m
quickwit-quickwit-indexer-0 1/1 Running 1 (27m ago) 29m
quickwit-quickwit-janitor-9479697ff-x4x2c 1/1 Running 1 (27m ago) 29m
quickwit-quickwit-metastore-56ff74df9f-k6d2g 1/1 Running 0 29m
quickwit-quickwit-searcher-0 1/1 Running 1 (27m ago) 29m
quickwit-quickwit-searcher-1 1/1 Running 0 27m
quickwit-quickwit-searcher-2 1/1 Running 0 27m
We can try to access to the Quickwit UI by port-forwarding the Quickwit searcher (dashboard) pod:
$ kubectl -n quickwit port-forward pod/quickwit-quickwit-searcher-0 7280
Head over to http://localhost:7280. And you should be ready to go!
Create your first index
Before adding documents to Quickwit, you need to create an index configured with a YAML config file. This config file notably lets you define how to map your input documents to your index fields and whether these fields should be stored and indexed. See the index config documentation.
Let's create an index configured to receive Stackoverflow posts (questions and answers).
# First, download the stackoverflow dataset config from Quickwit repository.
curl -o stackoverflow-index-config.yaml https://raw.githubusercontent.com/quickwit-oss/quickwit/main/config/tutorials/stackoverflow/index-config.yaml
The index config defines three fields: title, body and creationDate. title and body are indexed and tokenized, and they are also used as default search fields, which means they will be used for search if you do not target a specific field in your query. creationDate serves as the timestamp for each record. There are no more explicit field definitions as we can use the default dynamic mode: the undeclared fields will still be indexed, by default fast fields are enabled to enable aggregation queries. and the raw tokenizer is used for text.
And here is the complete config:
# Index config file for stackoverflow dataset.
#
version: 0.7
index_id: stackoverflow
doc_mapping:
field_mappings:
- name: title
type: text
tokenizer: default
record: position
stored: true
- name: body
type: text
tokenizer: default
record: position
stored: true
- name: creationDate
type: datetime
fast: true
input_formats:
- rfc3339
fast_precision: seconds
timestamp_field: creationDate
search_settings:
default_search_fields: [title, body]
indexing_settings:
commit_timeout_secs: 30
Now we can create the index with the command:
./quickwit index create --index-config ./stackoverflow-index-config.yaml
Check that a directory ./qwdata/indexes/stackoverflow
has been created, Quickwit will write index files here and a metastore.json
which contains the index metadata. You're now ready to fill the index.
Continue on to the Quickwit documentation to add your first documents and execute your first search queries.
If you like this sort of content and would like to see more of it, please consider supporting us by giving us a Star on GitHub 🙏