π¦ Deploymentο
The deploy directory contains an example dockerized deployment setup of the customised NiFi image, along with related services for document processing, NLP, and text analytics.
Make sure you have read the Prerequisites section before proceeding.
ποΈ Key filesο
services.ymlβ defines the core services that are orchestrated directly from this repository via Docker Compose.Makefileβ provides convenient commands for starting, stopping, and managing the deployment..envfiles in./deploy/, environment variables used across services, specifications:environment variables that apply only to the services defined inside
services.yml.Security-related
.envfiles (certificates, users) are under/security
These variables configure NiFi, Elasticsearch/OpenSearch, Kibana, Jupyter, Metricbeat, the sample DB, etc.
Important: If you run
docker composedirectly (instead ofmake), first load the envs with:source ./deploy/export_env_vars.shThe Makefile targets already do this for you.
π§© Modular service design (important)ο
This repository follows a modular deployment model:
Only the services defined in
services.ymluse the environment files located in./deploy/*.env.All other services included in the ecosystem are launched via
docker-composecommands inside their own directories, for example:./services/<service_name>/docker/docker-compose.yml
Each of these standalone services maintains its own environment configuration in:
./services/<service_name>/env/
This design allows each service to be:
independently configurable
versioned and deployed in isolation
consumed by other projects without modifying the core deployment
These are the files you will most commonly modify when creating or adjusting a deployment.
βοΈ Additional service configurationο
Service-specific configurations are located under:
./servicesNiFi-specific configuration (properties, custom processors, drivers, Python scripts, etc.) is under:
./nifi
β Helm (OpenSearch)ο
An initial Helm chart for OpenSearch + OpenSearch Dashboards is available at:
./deploy/charts/opensearch
Quick usage:
# render manifests
helm template cogstack-opensearch ./deploy/charts/opensearch \
--set-file envFile.raw=./deploy/elasticsearch.env
# install or upgrade
helm upgrade --install cogstack-opensearch ./deploy/charts/opensearch \
--set-file envFile.raw=./deploy/elasticsearch.env \
--namespace cogstack --create-namespace
The chart expects pre-created Kubernetes Secrets for TLS materials (see the chart README). The
--set-file envFile.raw=...flag injects values fromdeploy/elasticsearch.envinto pod environment variables. Only keys inenvFile.includeKeysare imported into the chart ConfigMap.
π§° Makefile Command Overviewο
A concise reference for controlling the full CogStack deployment stack (NiFi, Elasticsearch, JupyterHub, MedCAT, OCR-service, GitEA, Beats, DB, etc.).
All commands automatically load environment variables via export_env_vars.sh.
π Discover available Make targetsο
You can list all available deploy/Makefile targets with descriptions:
# from repository root
make -C deploy help
# from ./deploy
make help
π Manage a specific service on a specific machineο
Use remote targets to run Docker Compose on a remote host over SSH.
Prerequisites:
SSH access to the target machine
this repository checked out on the target machine
Docker + Docker Compose available on the target machine
# deploy (up -d)
make -C deploy remote-deploy-service \
REMOTE_HOST=ubuntu@10.20.0.15 \
REMOTE_REPO_DIR=/opt/cogstack_nifi \
REMOTE_SERVICES="nifi nifi-nginx" \
REMOTE_SSH_KEY=$HOME/.ssh/cogstack_prod.pem \
REMOTE_COMPOSE_FILE=services.yml
# stop
make -C deploy remote-stop-service \
REMOTE_HOST=ubuntu@10.20.0.15 \
REMOTE_REPO_DIR=/opt/cogstack_nifi \
REMOTE_SERVICES="nifi nifi-nginx" \
REMOTE_SSH_KEY=$HOME/.ssh/cogstack_prod.pem \
REMOTE_COMPOSE_FILE=services.yml
# delete containers (docker compose rm -f -s)
make -C deploy remote-delete-service \
REMOTE_HOST=ubuntu@10.20.0.15 \
REMOTE_REPO_DIR=/opt/cogstack_nifi \
REMOTE_SERVICES="nifi nifi-nginx" \
REMOTE_SSH_KEY=$HOME/.ssh/cogstack_prod.pem \
REMOTE_COMPOSE_FILE=services.yml
Set
REMOTE_SERVICESto one service (for examplekibana) or multiple services.Use
services.dev.ymlby settingREMOTE_COMPOSE_FILE=services.dev.yml.REMOTE_SSH_KEYis optional; if omitted, normal SSH config/agent auth is used.REMOTE_SSH_OPTSis optional for extra flags (for example-p 2222 -o StrictHostKeyChecking=accept-new).remote-delete-serviceremoves containers; it does not remove volumes.
π§ Utilitiesο
Command |
Description |
|---|---|
|
Load all environment variables |
|
Print environment variables (sorted) |
|
Freeze all security submodules (read-only) |
|
Unfreeze security submodules |
|
Update all submodules |
π Start Servicesο
Command |
Description |
|---|---|
|
Start NiFi and NiFi-Nginx |
|
Start NiFi dev services from |
|
Build and start NiFi dev services from |
|
Start ES-1, ES-2, Kibana |
|
Start ES-1, ES-2, ES-3 |
|
Start individual Elasticsearch nodes |
|
Start Metricbeat agents |
|
Start Filebeat agents |
|
Start Kibana only |
|
Start samples DB |
|
Start JupyterHub (prod config) |
|
Start MedCAT service |
|
Start DE-ID MedCAT service |
|
Start MedCAT Trainer + Solr + Nginx |
|
Start OCR-service (full + text-only) |
|
Start GitEA |
|
Start Databank DB |
|
Start NiFi + Elastic + Samples DB |
|
Full stack: data infra + NLP + JupyterHub + OCR |
π Stop Servicesο
Command |
Description |
|---|---|
|
Stop NiFi stack |
|
Stop NiFi dev services ( |
|
Stop ES-1, ES-2, Kibana |
|
Stop ES-1, ES-2 |
|
Stop individual ES nodes |
|
Stop Metricbeat agents |
|
Stop Filebeat agents |
|
Stop Kibana |
|
Stop samples DB |
|
Stop JupyterHub |
|
Stop MedCAT service |
|
Stop DE-ID MedCAT service |
|
Stop MedCAT Trainer stack |
|
Stop OCR-service stack |
|
Stop GitEA |
|
Stop Databank DB |
|
Stop NiFi + Elastic + Samples |
|
Stop entire stack |
ποΈ Delete Servicesο
Command |
Description |
|---|---|
|
Delete NiFi and NiFi-Nginx containers |
|
Delete NiFi and NiFi-Nginx containers |
|
Delete NiFi dev containers ( |
|
Delete NiFi/NiFi-Nginx images from |
|
Delete NiFi/NiFi-Nginx images from |
|
Remove NiFi-related volumes (via compose down |
|
Delete Elasticsearch and Kibana containers |
|
Remove Elasticsearch and Kibana volumes (via compose down |
|
Delete Databank DB containers |
|
Remove Databank DB volumes (via compose down |
|
Delete samples DB container |
|
Remove samples DB volumes (via compose down |
|
Delete MedCAT Trainer containers ( |
|
Remove MedCAT Trainer volumes (via compose down |
|
Delete JupyterHub container (alias: |
|
Delete MedCAT service container (alias: |
|
Delete DE-ID MedCAT service container (alias: |
|
Delete OCR-service containers (alias: |
π§Ή Cleanupο
Command |
Description |
|---|---|
|
Docker Compose |
|
Full teardown, including volumes |
π Notesο
All
start-*commands usedocker compose -f services.ymlunless referencing a specific serviceβs Dockerfile.start-allandstop-allact as the top-level orchestration entry points.Environment variables are always sourced using the integrated
WITH_ENVmacro.
If you want, I can also generate a minimal cheat sheet, or an ASCII tree diagram that shows how start-all expands into all services.
π Starting the Servicesο
All core services defined in services.yml can be started using the Makefile in the deploy/ directory.
For most services in the services folder that are not part of the core stack defined in services.yml and are pulled from external git submodule repositories, the start-up process is the same.
βΆοΈ Start each service individuallyο
You can start individual components of the CogStack-NiFi stack using the make start-* commands.
Each target loads all required environment variables automatically via export_env_vars.sh.
This is useful for:
debugging a single service
restarting only one component after config changes
running lightweight subsets of the stack
isolating problems or logs per service
π§© Core NiFi Servicesο
make start-nifi
Starts:
nifi β the Apache NiFi instance (main ETL/orchestration engine)
nifi-nginx β reverse proxy/front-end for NiFi
Use when you want to run, debug, or modify NiFi workflows without bringing up the entire ecosystem.
ποΈ Start Core Data Infrastructureο
make start-data-infra
Starts:
NiFi
NiFi Nginx
Elasticsearch
Samples DB
Ideal for running ingestion pipelines and ETL workflows.
π’οΈ Elasticsearch / OpenSearch Servicesο
Please note that to switch from OpenSearch (Amazon open-source fork) to ElasticSearch you will need to change some environment variables, see the configuration section.
make start-elastic
Starts the standard 2-node Elasticsearch cluster + Kibana.
make start-elastic-cluster
Starts all 3 ES nodes. Useful for testing clustering, sharding, and replication.
make start-elastic-1
make start-elastic-2
make start-elastic-3
Start individual Elasticsearch nodes for debugging or failure-scenario testing.
π Kibanaο
make start-kibana
Starts Kibana for inspecting logs, checking index mappings, monitoring ES health, and debugging pipelines.
ποΈ Databasesο
make start-samples
Starts samples-db, the small example DB used for demo flows.
make start-production-db
Starts the cogstack-databank-db production database.
Use when testing SQL ingestion or verifying DB-driven NiFi flows.
π JupyterHubο
make start-jupyter
Starts the CogStack JupyterHub instance. Used for notebooks, analysis, model testing, and visualisation.
π§ NLP Services (MedCAT Service & Trainer)ο
make start-medcat-service
Starts the MedCAT concept extraction inference API.
make start-medcat-service-deid
Starts the MedCAT DEID (de-identification) inference API.
make start-medcat-trainer
Starts the full MedCAT Trainer stack (Trainer UI + Solr + NGINX). Useful for annotation and supervised training tasks.
π OCR Servicesο
make start-ocr-services
Starts:
ocr-service β main OCR pipeline
ocr-service-text-only β lightweight OCR/text extraction
Use for PDF ingestion, OCR debugging, and pipeline validation.
π οΈ Miscellaneous Services (GIT EA)ο
make start-git-ea
Starts the internal Gitea Git server used for local code/config storage.
π Start the Entire Stackο
make start-all
Starts everything:
Core infra
JupyterHub
MedCAT NLP services
OCR services
Use for complete deployments, demos, or full-stack development.