# πŸ“¦ Deployment The [`deploy`](https://github.com/CogStack/CogStack-NiFi/tree/main/deploy/) directory contains an example dockerized deployment setup of the customised NiFi image, along with related services for document processing, NLP, and text analytics. Make sure you have read the [Prerequisites](./main.md) section before proceeding. ## πŸ—‚οΈ Key files - **`services.yml`** – defines the *core* services that are orchestrated directly from this repository via Docker Compose. - **`Makefile`** – provides convenient commands for starting, stopping, and managing the deployment. - **`.env` files in `./deploy/`** , environment variables used across services, specifications: - environment variables that apply **only to the services defined inside `services.yml`**. - Security-related `.env` files (certificates, users) are under **`/security`** These variables configure NiFi, Elasticsearch/OpenSearch, Kibana, Jupyter, Metricbeat, the sample DB, etc. > **Important:** If you run `docker compose` directly (instead of `make`), first load the envs with: > > ```bash > source ./deploy/export_env_vars.sh > ``` > > The Makefile targets already do this for you. ## 🧩 Modular service design (important) This repository follows a **modular deployment model**: - Only the services defined in **`services.yml`** use the environment files located in **`./deploy/*.env`**. - **All other services** included in the ecosystem are launched via `docker-compose` commands inside their own directories, for example: ```bash ./services//docker/docker-compose.yml ``` - Each of these standalone services maintains **its own environment configuration** in: ```bash ./services//env/ ``` This design allows each service to be: - independently configurable - versioned and deployed in isolation - consumed by other projects without modifying the core deployment > These are the files you will most commonly modify when creating or adjusting a deployment. ## βš™οΈ Additional service configuration - Service-specific configurations are located under: [`./services`](https://github.com/CogStack/CogStack-NiFi/tree/main/services/) - NiFi-specific configuration (properties, custom processors, drivers, Python scripts, etc.) is under: [`./nifi`](https://github.com/CogStack/CogStack-NiFi/tree/main/nifi/) ## ⎈ Helm (OpenSearch) An initial Helm chart for OpenSearch + OpenSearch Dashboards is available at: ```bash ./deploy/charts/opensearch ``` Quick usage: ```bash # render manifests helm template cogstack-opensearch ./deploy/charts/opensearch \ --set-file envFile.raw=./deploy/elasticsearch.env # install or upgrade helm upgrade --install cogstack-opensearch ./deploy/charts/opensearch \ --set-file envFile.raw=./deploy/elasticsearch.env \ --namespace cogstack --create-namespace ``` > The chart expects pre-created Kubernetes Secrets for TLS materials (see the chart README). > The `--set-file envFile.raw=...` flag injects values from `deploy/elasticsearch.env` into pod environment variables. > Only keys in `envFile.includeKeys` are imported into the chart ConfigMap. ## 🧰 Makefile Command Overview A concise reference for controlling the full CogStack deployment stack (NiFi, Elasticsearch, JupyterHub, MedCAT, OCR-service, GitEA, Beats, DB, etc.). All commands automatically load environment variables via `export_env_vars.sh`. ### πŸ”Ž Discover available Make targets You can list all available `deploy/Makefile` targets with descriptions: ```bash # from repository root make -C deploy help ``` ```bash # from ./deploy make help ``` ### 🌐 Manage a specific service on a specific machine Use remote targets to run Docker Compose on a remote host over SSH. Prerequisites: - SSH access to the target machine - this repository checked out on the target machine - Docker + Docker Compose available on the target machine ```bash # deploy (up -d) make -C deploy remote-deploy-service \ REMOTE_HOST=ubuntu@10.20.0.15 \ REMOTE_REPO_DIR=/opt/cogstack_nifi \ REMOTE_SERVICES="nifi nifi-nginx" \ REMOTE_SSH_KEY=$HOME/.ssh/cogstack_prod.pem \ REMOTE_COMPOSE_FILE=services.yml ``` ```bash # stop make -C deploy remote-stop-service \ REMOTE_HOST=ubuntu@10.20.0.15 \ REMOTE_REPO_DIR=/opt/cogstack_nifi \ REMOTE_SERVICES="nifi nifi-nginx" \ REMOTE_SSH_KEY=$HOME/.ssh/cogstack_prod.pem \ REMOTE_COMPOSE_FILE=services.yml ``` ```bash # delete containers (docker compose rm -f -s) make -C deploy remote-delete-service \ REMOTE_HOST=ubuntu@10.20.0.15 \ REMOTE_REPO_DIR=/opt/cogstack_nifi \ REMOTE_SERVICES="nifi nifi-nginx" \ REMOTE_SSH_KEY=$HOME/.ssh/cogstack_prod.pem \ REMOTE_COMPOSE_FILE=services.yml ``` - Set `REMOTE_SERVICES` to one service (for example `kibana`) or multiple services. - Use `services.dev.yml` by setting `REMOTE_COMPOSE_FILE=services.dev.yml`. - `REMOTE_SSH_KEY` is optional; if omitted, normal SSH config/agent auth is used. - `REMOTE_SSH_OPTS` is optional for extra flags (for example `-p 2222 -o StrictHostKeyChecking=accept-new`). - `remote-delete-service` removes containers; it does not remove volumes. --- ### πŸ”§ Utilities | Command | Description | |------------------------|---------------------------------------------| | `make load-env` | Load all environment variables | | `make show-env` | Print environment variables (sorted) | | `make git-freeze-security` | Freeze all security submodules (read-only) | | `make git-unfreeze-security` | Unfreeze security submodules | | `make git-update-submodules` | Update all submodules | --- ### πŸš€ Start Services | Command | Description | |---------------------------------|-------------| | `make start-nifi` | Start NiFi and NiFi-Nginx | | `make start-nifi-dev` | Start NiFi dev services from `services.dev.yml` | | `make start-nifi-dev-build` | Build and start NiFi dev services from `services.dev.yml` | | `make start-elastic` | Start ES-1, ES-2, Kibana | | `make start-elastic-cluster` | Start ES-1, ES-2, ES-3 | | `make start-elastic-1/2/3` | Start individual Elasticsearch nodes | | `make start-metricbeat-1/2/3` | Start Metricbeat agents | | `make start-filebeat-1/2/3` | Start Filebeat agents | | `make start-kibana` | Start Kibana only | | `make start-samples` | Start samples DB | | `make start-jupyter` | Start JupyterHub (prod config) | | `make start-medcat-service` | Start MedCAT service | | `make start-medcat-service-deid`| Start DE-ID MedCAT service | | `make start-medcat-trainer` | Start MedCAT Trainer + Solr + Nginx | | `make start-ocr-services` | Start OCR-service (full + text-only) | | `make start-git-ea` | Start GitEA | | `make start-production-db` | Start Databank DB | | **`make start-data-infra`** | Start NiFi + Elastic + Samples DB | | **`make start-all`** | Full stack: data infra + NLP + JupyterHub + OCR | --- ### πŸ›‘ Stop Services | Command | Description | |---------------------------------|-------------| | `make stop-nifi` | Stop NiFi stack | | `make stop-nifi-dev` | Stop NiFi dev services (`services.dev.yml`) | | `make stop-elastic` | Stop ES-1, ES-2, Kibana | | `make stop-elastic-cluster` | Stop ES-1, ES-2 | | `make stop-elastic-1/2/3` | Stop individual ES nodes | | `make stop-metricbeat-1/2/3` | Stop Metricbeat agents | | `make stop-filebeat-1/2/3` | Stop Filebeat agents | | `make stop-kibana` | Stop Kibana | | `make stop-samples` | Stop samples DB | | `make stop-jupyter` | Stop JupyterHub | | `make stop-medcat-service` | Stop MedCAT service | | `make stop-medcat-service-deid` | Stop DE-ID MedCAT service | | `make stop-medcat-trainer` | Stop MedCAT Trainer stack | | `make stop-ocr-services` | Stop OCR-service stack | | `make stop-git-ea` | Stop GitEA | | `make stop-production-db` | Stop Databank DB | | **`make stop-data-infra`** | Stop NiFi + Elastic + Samples | | **`make stop-all`** | Stop entire stack | --- ### πŸ—‘οΈ Delete Services | Command | Description | |---------------------------------|-------------| | `make delete-nifi` | Delete NiFi and NiFi-Nginx containers | | `make delete-nifi-containers` | Delete NiFi and NiFi-Nginx containers | | `make delete-nifi-dev-containers` | Delete NiFi dev containers (`services.dev.yml`) | | `make delete-nifi-images` | Delete NiFi/NiFi-Nginx images from `services.yml` | | `make delete-nifi-dev-images` | Delete NiFi/NiFi-Nginx images from `services.dev.yml` | | `make delete-nifi-volumes` | Remove NiFi-related volumes (via compose down `-v`) | | `make delete-elastic` | Delete Elasticsearch and Kibana containers | | `make delete-elastic-volumes` | Remove Elasticsearch and Kibana volumes (via compose down `-v`) | | `make delete-databank` | Delete Databank DB containers | | `make delete-databank-volumes` | Remove Databank DB volumes (via compose down `-v`) | | `make delete-samples-db` | Delete samples DB container | | `make delete-samples-db-volumes`| Remove samples DB volumes (via compose down `-v`) | | `make delete-medcat-trainer` | Delete MedCAT Trainer containers (`medcattrainer`, `nginx`, `solr`) | | `make delete-medcat-trainer-volumes`| Remove MedCAT Trainer volumes (via compose down `-v`) | | `make delete-jupyter` | Delete JupyterHub container (alias: `make make-delete-jupyter`) | | `make delete-medcat-service` | Delete MedCAT service container (alias: `make make-delete-medcat-service`) | | `make delete-medcat-service-deid`| Delete DE-ID MedCAT service container (alias: `make make-delete-medcat-service-deid`) | | `make delete-ocr-services` | Delete OCR-service containers (alias: `make make-delete-ocr-services`) | --- ### 🧹 Cleanup | Command | Description | |------------------|---------------------------------------------| | `make down-all` | Docker Compose `down` for all core services | | `make cleanup` | Full teardown, including volumes | --- ### πŸ“ Notes - All `start-*` commands use `docker compose -f services.yml` unless referencing a specific service’s Dockerfile. - `start-all` and `stop-all` act as the top-level orchestration entry points. - Environment variables are **always sourced** using the integrated `WITH_ENV` macro. --- If you want, I can also generate a **minimal cheat sheet**, or an **ASCII tree diagram** that shows how `start-all` expands into all services. ## πŸš€ Starting the Services All core services defined in `services.yml` can be started using the Makefile in the `deploy/` directory. For most services in the `services` folder that are not part of the core stack defined in `services.yml` and are pulled from external git submodule repositories, the start-up process is the same. ### ▢️ Start each service individually You can start individual components of the CogStack-NiFi stack using the `make start-*` commands. Each target loads all required environment variables automatically via `export_env_vars.sh`. This is useful for: - debugging a single service - restarting only one component after config changes - running lightweight subsets of the stack - isolating problems or logs per service --- ### 🧩 Core NiFi Services ```bash make start-nifi ``` Starts: - **nifi** β€” the Apache NiFi instance (main ETL/orchestration engine) - **nifi-nginx** β€” reverse proxy/front-end for NiFi Use when you want to run, debug, or modify NiFi workflows without bringing up the entire ecosystem. --- ### πŸ—οΈ Start Core Data Infrastructure ```bash make start-data-infra ``` Starts: - NiFi - NiFi Nginx - Elasticsearch - Samples DB Ideal for running ingestion pipelines and ETL workflows. --- #### πŸ›’οΈ Elasticsearch / OpenSearch Services Please note that to switch from OpenSearch (Amazon open-source fork) to ElasticSearch you will need to change some environment variables, see the [configuration](./configuration.md) section. ```bash make start-elastic ``` Starts the standard 2-node Elasticsearch cluster + Kibana. ```bash make start-elastic-cluster ``` Starts all 3 ES nodes. Useful for testing clustering, sharding, and replication. ```bash make start-elastic-1 make start-elastic-2 make start-elastic-3 ``` Start individual Elasticsearch nodes for debugging or failure-scenario testing. --- #### πŸ“ˆ Kibana ```bash make start-kibana ``` Starts Kibana for inspecting logs, checking index mappings, monitoring ES health, and debugging pipelines. --- ### πŸ—„οΈ Databases ```bash make start-samples ``` Starts **samples-db**, the small example DB used for demo flows. ```bash make start-production-db ``` Starts the **cogstack-databank-db** production database. Use when testing SQL ingestion or verifying DB-driven NiFi flows. --- ### πŸ“š JupyterHub ```bash make start-jupyter ``` Starts the CogStack JupyterHub instance. Used for notebooks, analysis, model testing, and visualisation. --- ### 🧠 NLP Services (MedCAT Service & Trainer) ```bash make start-medcat-service ``` Starts the MedCAT concept extraction inference API. ```bash make start-medcat-service-deid ``` Starts the MedCAT DEID (de-identification) inference API. ```bash make start-medcat-trainer ``` Starts the full MedCAT Trainer stack (Trainer UI + Solr + NGINX). Useful for annotation and supervised training tasks. --- ### πŸ“ OCR Services ```bash make start-ocr-services ``` Starts: - **ocr-service** β€” main OCR pipeline - **ocr-service-text-only** β€” lightweight OCR/text extraction Use for PDF ingestion, OCR debugging, and pipeline validation. --- ### πŸ› οΈ Miscellaneous Services (GIT EA) ```bash make start-git-ea ``` Starts the internal Gitea Git server used for local code/config storage. --- ### πŸš€ Start the Entire Stack ```bash make start-all ``` Starts everything: - Core infra - JupyterHub - MedCAT NLP services - OCR services Use for complete deployments, demos, or full-stack development.