π§© Servicesο
This section provides a complete overview of all services included in the CogStack-NiFi deployment.
All services run in Docker and interact within a shared internal Docker network.
π Overviewο
Below is a high-level architecture diagram illustrating how CogStack services communicate when all components are enabled:

π§© Primary Servicesο
The core services defined in services.yml include:
samples-db β PostgreSQL database populated with demo datasets.
cogstack-databank-db / cogstack-databank-db-mssql β Production-grade PostgreSQL and optional MSSQL instances.
elasticsearch-1 / elasticsearch-2 / elasticsearch-3 β Multi-node Elasticsearch or OpenSearch cluster.
metricbeat / filebeat β Elastic monitoring and log forwarder services.
nifi β Apache NiFi single-node instance with embedded ZooKeeper.
nifi-nginx β Reverse proxy providing secure access to NiFi.
ocr-service / ocr-service-text-only β High-performance Python OCR and text extraction services.
nlp-medcat-service-production β MedCAT NLP model service with REST API.
medcat-trainer-ui / medcat-trainer-nginx β Web UI and reverse proxy for model training and refinement.
kibana β OpenSearch Dashboards UI.
jupyter-hub β Fully featured data science interface.
git-ea β Selfβhosted Git service (Gitea).
π Note: Important configuration options and environment variables for these services are managed in
services.ymland the associated.envfiles underdeploy/andsecurity/.
ποΈ Service Definitionsο
All core services are defined in:
deploy/services.yml
They run inside the internal Docker network cognet.
Some services expose ports to the host for convenience.
π£οΈ NLP/OCR and other services API Endpointsο
Most web ETL & data-enrichment API services that we use will offer thw following endpoints for querying.
GET
/api/infoPOST
/api/processPOST
/api/process_bulk
Useful for NiFi workflows (see workflows.md).
𧬠MedCAT Serviceο
Runs a REST API for model inference uses the MedCAT library which performss clinical concept extraction and linking.
The service has two operation modes:
concept detection: exctracts medical concepts: outputs original text + annotations list.
de-id mode aka. AnonCAT mode, for de-identifying documents: outputs de-identified text + (will output annotations that represent what was de-id in a future version).
Accessο
https://localhost:5555/api/info- NER container, check if model loads successfullyhttps://localhost:5556/api/info- DE-ID/AnonCAT container
Containersο
cogstack-medcat-service-production- for concept NERcogstack-medcat-service-production-deid- for DE-ID/AnonCAT
Service location & filesο
dir:
/services/cogstack-nlp/medcat-service/docker compose file:
/services/cogstack-nlp/medcat-service/docker/docker-compose.ymlenv: located in
services/cogstack-nlp/medcat-service/env/app.env - controls APP settings (number of cpus used, log level, etc) used by the NER container cogstack-medcat-service-production medcat.env - used by the NER container, controls MedCAT settings directly. app_deid.env - used by the DE-ID container, same app setting control, the main difference being the `APP_DEID_MODE`. medcat_deid.env - used by the DE-ID container, controls MedCAT settings directly
Portsο
Service |
External Port |
Internal Port |
|---|---|---|
NER (MedCAT) |
|
|
DE-ID / AnonCAT |
|
|
Modelsο
A default MedMentions
MedMenNER+L model (includes MetaCAT models) is available for public use but needs to be downloaded.To download a model head to the directory of the service
services/cogstack-nlp/medcat-service/scriptsExecute:
bash download_medmen.sh, wait for download to complete.
READMEο
Please check the serviceβs own README.md
π οΈ MedCAT Trainerο
Provides UI workflows for annotation, correction, and iterative model training.
Accessο
https://localhost:8001
Containersο
medcattrainermedcattrainer_nginxmct_solr
Service location & filesο
dir:
services/cogstack-nlp/medcat-trainer/docker compose file:
services/cogstack-nlp/medcat-trainer/docker-compose-prod.ymlenv:
services/cogstack-nlp/medcat-trainer/envs/env-prod
Portsο
external:
8001
READMEο
π Jupyter Hubο
A multi-user JupyterHub instance deployed via Docker.
Accessο
https://localhost:8888
Containersο
cogstack-jupyter-hubcogstack-jupyter-singleuser-<USERNAME>(per user container started by each user once hub is up)
Service location & filesο
dir:
services/cogstack-jupyter-hub/docker compose file:
services/cogstack-jupyter-hub/docker/env:
services/cogstack-jupyter-hub/env/jupyter.env
Supportsο
Per-user containers
CPU/RAM limits (via
services/cogstack-jupyter-hub/env/jupyter.env)Optional GPU support
Notebook image selection
Portsο
Component |
External Port |
Internal Port(s) |
|---|---|---|
JupyterHub |
|
|
READMEο
Please check the serviceβs own README.md file.
π§ͺ Samples DB (PostgreSQL)ο
Demo dataset with:
patients
encounters
observations
raw medical reports
cleaned reports
annotation tables
Acessο
localhost:5555
Portsο
external:
5432internal:
5432
Credentialsο
user -
test, password -test
π¦ Cogstack databank production DB (Production only: PgSQL, MSSQL)ο
Empty database for production ingestion pipelines.
Supports both PostgreSQL and MSSQL.
Place schema files inside and they will be loaded instantly on container startup:
services/cogstack-db/<DB_PROVIDER>/schemas/
Where <DB_PROVIDER> can be: mssql,pgsql.
Credentialsο
PgSQL: user -
adminpassword -adminMsSQL: user -
adminpassword -admin!COGSTACK2022
Accessο
PgSQL:
localhost:5558β container5432MSSQL:
localhost:1443β container1433
Containersο
PgSQL:
cogstack-databank-dbMSSQL:
cogstack-databank-db-mssql
Service location & filesο
docker compose file:
services.ymldir:
services/cogstack-db/env:
security/users/users_database.env- controlers DB user credentialsdeploy/database.env- general DB configs
Portsο
Database |
External Port |
Internal Port |
|---|---|---|
PgSQL |
|
|
MSSQL |
|
|
π§ Apache NiFiο
Primary ETL/processing engine.
This service is complex and is completely described in this section.
Credentialsο
Default user: user -
adminpassword -cogstackNiFi
Accessο
https://localhost:8443 (via nifi-nginx)
Containersο
NiFi:
cogstack-nifi
Service location & filesο
docker compose file:
services.ymldir:
nifi/env:
/deploy/nifi.env- general NiFi settings, JVM memory, etc./security/nifi_users.env- controlers DB user credentials/security/certificates_nifi.env
Portsο
Component |
External Port |
Internal Port |
|---|---|---|
NiFi |
|
|
π ELK Stack (Elasticsearch / OpenSearch)ο
Backend search and indexing engine powering document storage, query, analytics, and NLP output retrieval.
This service is fully described in the Elasticsearch section of the documentation.
The repo supports both:
ElasticSearch (native)
OpenSearch (Amazon fork)
Switch between modes via environment variables in deploy/elasticsearch.env.
π’οΈ Elasticsearch / OpenSearchο
Credentialsο
OpenSearch: user -
admin, password -adminElasticSearch: user -
elastic, password -kibanaserver
Accessο
http://localhost:9200β Node 1http://localhost:9201β Node 2http://localhost:9202β Node 3
Containersο
elasticsearch-1elasticsearch-2elasticsearch-3
Portsο
all ports need to be exposed via firewall to allow for intercluster communication, we assume 1 different port per node if hosted on the same machine/VM, in production mode all machines can have and use the following ports (if they live on separarate VMs/machines ):
9200,9300,9600internal:
9300,9301,9302,9600,9601,9602,9200,9201,9202external:
9300,9301,9302,9600,9601,9602,9200,9201,9202
Node |
HTTP |
Transport |
Analyzer |
|---|---|---|---|
ES1 |
|
|
|
ES2 |
|
|
|
ES3 |
|
|
|
Service Location & filesο
docker compose:
deploy/services.ymlconfig:
services/elasticsearch/config/env:
/deploy/elasticsearch.env/security/certificates_elasticsearch.env/security/elasticsearch_users.env
SSL & Certificatesο
Certificates stored in:
/security/certificates/elastic/<ELASTICSEARCH_VERSION>/
Settings in:
certificates_elasticsearch.env
π Metricbeat & Filebeatο
Lightweight Elastic stack agents used for monitoring and log forwarding.
They run alongside Elasticsearch to provide observability of the cluster and ingestion pipelines.
Purpose:
Metricbeat β collects system & Elasticsearch metrics (CPU, memory, JVM, node health).
Filebeat β ships container and service logs into Elasticsearch.
Both run as independent containers in the deployment.
Containersο
Metricbeat:
metricbeat-1metricbeat-2metricbeat-3
Filebeat:
filebeat-1filebeat-2filebeat-3
Service Location & Filesο
compose:
deploy/services.ymlconfig:
services/metricbeat/metricbeat.ymlservices/filebeat/filebeat.yml
env:
/deploy/elasticsearch.env/security/elasticsearch_users.env
Portsο
No external ports exposed.
All communication occurs internally within the cogstack-net Docker network.
Notesο
Elasticsearch must be running before Metricbeat or Filebeat start.
Only Elastic-native Beats are available; OpenSearch-native Beats do not exist.
Authentication/credentials come from
elasticsearch_users.env.
π Kibana / OpenSearch Dashboardsο
Web UI for exploring indexed data, visualising documents, managing index templates, monitoring the cluster, and debugging ingestion pipelines.
Purpose:
Search & browse Elasticsearch/OpenSearch indices
Visualise ingestion outputs and cluster metrics
Manage index patterns, dashboards, and Dev Tools
Validate mappings and test queries used in NiFi flows
Host Accessο
URL: https://localhost:5601
credentialsο
OpenSearch Dashboards:
admin/adminElasticsearch Native:
elastic/kibanaserver
Containersο
cogstack-kibana(OpenSearch Dashboards or Kibana depending on configuration)
Service Location & Filesο
docker compose:
deploy/services.ymlconfig files:
services/kibana/config/elasticsearch.yml(Elasticsearch)services/kibana/config/opensearch.yml(OpenSearch Dashboards)
env:
/deploy/elasticsearch.env/security/certificates_elasticsearch.env/security/elasticsearch_users.env
Image selection controlled by:
${ELASTICSEARCH_KIBANA_DOCKER_IMAGE}${KIBANA_VERSION}${KIBANA_CONFIG_FILE_VERSION}
Portsο
Component |
External |
Internal |
|---|---|---|
Kibana / OpenSearch Dashboards |
|
|
Notesο
Must be started after Elasticsearch/OpenSearch
Connects automatically using
ELASTICSEARCH_HOSTSTLS/user settings are applied from the
/securityenv files
π€ OCR Serviceο
High-performance document text extraction engine replacing legacy Tika for OCR + text processing. In the near future it will be possible to use LLMs/custom models for ocr-ing (pending v2 release, ETA 2026).
The service comes in two variants:
ocr-service β full OCR pipeline (images β text)
ocr-service-text-only β lightweight mode (text extraction only, no OCR)
Both expose a simple REST API.
Purpose:
Extract text from PDFs, images, and scanned documents
Provide OCR via Tesseract (wrapped in optimised Python service)
Provide fast plain text extraction for digital PDFs (text-only variant)
Designed for large-scale throughput within NiFi ingestion pipelines
Accessο
ocr-service:
http://localhost:8090/api/processocr-seervice-text-only:
http://localhost:8091/api/process
Containersο
ocr-serviceocr-service-text-only
Both built from:
cogstacksystems/cogstack-ocr-service:<release>
Service Location & Filesο
docker compose file:
services/ocr-service/docker/docker-compose.ymlservice directory:
services/ocr-service/logs:
Host:
services/ocr-service/log/Container:
/ocr_service/log/
env files:
deploy/general.envβ shared variablesservices/ocr-service/env/ocr_service.envβ full OCR configservices/ocr-service/env/ocr_service_text_only.envβ overrides for text-only pipeline
Portsο
Service |
External |
Internal |
|---|---|---|
ocr-service |
|
|
ocr-service-text-only |
|
|
Both expose the API internally on port 8090.
Please check the serviceβs own README.md
ποΈ Git-eaο
Self-hosted Git instance (Gitea). Lightweight GitHub/GitLab-style service used for hosting repositories inside secure or offline environments.
Purpose:
Internal code hosting for organisations without external Git access
Repository management, issue tracking, wiki, and basic CI hooks
Ideal for notebooks, configs, workflows, and internal project code
Accessο
URL: http://localhost:3000 (default Gitea port)
Containersο
gitea
Service Location & Filesο
docker compose file:
deploy/services.ymlconfig file:
services/gitea/app.inienv files:
/security/certificates_general.env
Persistent repository data is stored in the volume defined in services.yml.
Portsο
Service |
External |
Internal |
|---|---|---|
Git-ea |
|
|
Notesο
Supports repository migration from external Git servers
Mirroring available when external access is allowed
Can use CogStack certificates for HTTPS if configured
π§± NGINXο
Note: this component may eventually be replaced by Traefik as the preferred reverseβproxy and ingress layer for CogStack deployments.
NGINX is used as a lightweight reverse proxy to provide secure, unified access to internal CogStack services.
It handles HTTPS, routing, and access control for NiFi, MedCAT Trainer, and other components.
MedCAT-Trainer has its own nginx instance that runs independently.
Purpose:
Secure external access to internal services
Reverse proxy for NiFi, MedCAT Trainer, and service UIs
TLS termination (optional)
Basic auth / access control where required
Two variants are included:
nginx-nifi β main proxy for NiFi and related services
nginx-medcat-trainer β specialized proxy for MedCAT Trainer
Two variants:
nginx-nifi β main proxy for services
nginx-medcat-trainer β dedicated trainer proxy
Accessο
Examples (actual paths depend on config):
NiFi:
https://localhost:8443MedCAT Trainer:
https://localhost:8001
Routing rules are defined in the NGINX configuration files.
Containersο
nifi-nginxβ main proxy for NiFimedcat-trainer-nginxβ proxy dedicated to MedCAT Trainer
Service Location & Filesο
docker compose file:
deploy/services.yml, trainer -deploy/cogstack-nlp/medcat-trainerconfig files:
services/nginx/config/nifi.confservices/nginx/config/medcat-trainer.confadditional templates under
services/nginx/config/
env / certificates:
/security/certificates_general.env/security/certificates_nifi.env
Uses shared CogStack Root CA & NiFi certs (
root-ca.p12,root-ca.key,nifi.key,nifi.pem)
Portο
Proxy Target |
External |
Internal |
|---|---|---|
NiFi |
|
|
Notesο
Provides HTTPS entrypoints for internal services
Works with CogStack certificate bundle
Trainer uses a separate NGINX instance for routing differences
Modify NGINX configs only if comfortable with its syntax