Skip to main content
🛠 This page is for engineering teams self-hosting their own Lightdash instance. If you want to monitor usage and analytics, go to the Usage analytics guide.
Lightdash can expose Prometheus metrics to help you monitor the performance and health of your Lightdash instance. This guide explains how to enable and configure Prometheus metrics for your self-hosted Lightdash deployment.

Enabling Prometheus metrics

By default, Prometheus metrics are disabled in Lightdash. To enable them, set the following environment variable:
LIGHTDASH_PROMETHEUS_ENABLED=true

Configuration options

You can customize the Prometheus metrics endpoint using the following environment variables:
VariableDescriptionRequired?Default
LIGHTDASH_PROMETHEUS_ENABLEDEnables/Disables Prometheus metrics endpointfalse
LIGHTDASH_PROMETHEUS_PORTPort for Prometheus metrics endpoint9090
LIGHTDASH_PROMETHEUS_PATHPath for Prometheus metrics endpoint/metrics
LIGHTDASH_PROMETHEUS_PREFIXPrefix for metric names
LIGHTDASH_GC_DURATION_BUCKETSBuckets for duration histogram in seconds0.001, 0.01, 0.1, 1, 2, 5
LIGHTDASH_EVENT_LOOP_MONITORING_PRECISIONPrecision for event loop monitoring in milliseconds. Must be greater than zero.10
LIGHTDASH_PROMETHEUS_LABELSLabels to add to all metrics. Must be valid JSON
LIGHTDASH_CUSTOM_METRICS_CONFIG_PATHPath to a JSON config file for custom event-driven counter metrics

Available metrics

Lightdash exposes the following metrics:

Process metrics

These metrics provide information about the Node.js process running Lightdash:
MetricTypeDescription
process_cpu_user_seconds_totalcounterTotal user CPU time spent in seconds
process_cpu_system_seconds_totalcounterTotal system CPU time spent in seconds
process_cpu_seconds_totalcounterTotal user and system CPU time spent in seconds
process_start_time_secondsgaugeStart time of the process since unix epoch in seconds
process_resident_memory_bytesgaugeResident memory size in bytes
process_virtual_memory_bytesgaugeVirtual memory size in bytes
process_heap_bytesgaugeProcess heap size in bytes
process_open_fdsgaugeNumber of open file descriptors
process_max_fdsgaugeMaximum number of open file descriptors

Node.js metrics

These metrics provide information about the Node.js runtime:
MetricTypeDescription
nodejs_eventloop_lag_secondsgaugeLag of event loop in seconds
nodejs_eventloop_lag_min_secondsgaugeThe minimum recorded event loop delay
nodejs_eventloop_lag_max_secondsgaugeThe maximum recorded event loop delay
nodejs_eventloop_lag_mean_secondsgaugeThe mean of the recorded event loop delays
nodejs_eventloop_lag_stddev_secondsgaugeThe standard deviation of the recorded event loop delays
nodejs_eventloop_lag_p50_secondsgaugeThe 50th percentile of the recorded event loop delays
nodejs_eventloop_lag_p90_secondsgaugeThe 90th percentile of the recorded event loop delays
nodejs_eventloop_lag_p99_secondsgaugeThe 99th percentile of the recorded event loop delays
nodejs_active_resourcesgaugeNumber of active resources that are currently keeping the event loop alive, grouped by async resource type
nodejs_active_resources_totalgaugeTotal number of active resources
nodejs_active_handlesgaugeNumber of active libuv handles grouped by handle type
nodejs_active_handles_totalgaugeTotal number of active handles
nodejs_active_requestsgaugeNumber of active libuv requests grouped by request type
nodejs_active_requests_totalgaugeTotal number of active requests
nodejs_heap_size_total_bytesgaugeProcess heap size from Node.js in bytes
nodejs_heap_size_used_bytesgaugeProcess heap size used from Node.js in bytes
nodejs_external_memory_bytesgaugeNode.js external memory size in bytes
nodejs_heap_space_size_total_bytesgaugeProcess heap space size total from Node.js in bytes
nodejs_heap_space_size_used_bytesgaugeProcess heap space size used from Node.js in bytes
nodejs_heap_space_size_available_bytesgaugeProcess heap space size available from Node.js in bytes
nodejs_version_infogaugeNode.js version info
nodejs_gc_duration_secondshistogramGarbage collection duration by kind
nodejs_eventloop_utilizationgaugeThe calculated Event Loop Utilization (ELU) as a percentage

PostgreSQL metrics

These metrics provide information about the PostgreSQL connection pool:
MetricTypeDescriptionLabels
pg_pool_max_sizegaugeMax size of the PG pool
pg_pool_sizegaugeCurrent size of the PG pool
pg_active_connectionsgaugeNumber of active connections in the PG pool
pg_idle_connectionsgaugeNumber of idle connections in the PG pool
pg_queued_queriesgaugeNumber of queries waiting in the PG pool queue
pg_connection_acquire_timehistogramTime to acquire a connection from the PG pool in milliseconds
pg_query_durationhistogramHistogram of PG query execution time in milliseconds

Queue metrics

MetricTypeDescription
queue_sizegaugeNumber of jobs in the queue

Query metrics

These metrics track query execution performance. The context label is either scheduled or interactive based on the execution context.
MetricTypeDescriptionLabels
lightdash_query_status_totalcounterTotal number of queries by terminal statusstatus, context
lightdash_query_state_transitions_totalcounterQuery state transitionsfrom, to, context
lightdash_query_queue_wait_duration_secondshistogramTime spent waiting in queue before executioncontext
lightdash_query_total_duration_secondshistogramTotal query duration from creation to results readycontext
lightdash_query_warehouse_duration_secondshistogramWarehouse query execution durationwarehouse_type, context
lightdash_query_overhead_duration_secondshistogramLightdash overhead: total duration minus warehouse execution timecontext
lightdash_query_cache_hit_totalcounterTotal number of query cache hits and missesresult, context, has_pre_aggregate_match

Pre-aggregate metrics

These metrics track the pre-aggregate system, including materialization, DuckDB resolution, and file management:
MetricTypeDescriptionLabels
lightdash_pre_aggregate_match_totalcounterTotal number of pre-aggregate match attemptsresult, miss_reason, format
lightdash_pre_aggregate_materialization_totalcounterTotal number of pre-aggregate materializations by outcomestatus, trigger
lightdash_pre_aggregate_active_materializationsgaugeCurrent number of active pre-aggregate materializations
lightdash_pre_aggregate_materialization_duration_secondshistogramPre-aggregate materialization durationstatus, trigger
lightdash_pre_aggregate_materialization_poll_duration_secondshistogramTime spent polling for materialization query completion in secondsstatus, trigger
lightdash_pre_aggregate_materialization_warehouse_duration_secondshistogramWarehouse execution time during materialization in secondsstatus, trigger
lightdash_pre_aggregate_materialization_promote_duration_secondshistogramTime to check file size and promote materialization to active in secondsstatus, trigger
lightdash_pre_aggregate_materialization_file_size_byteshistogramFile size of pre-aggregate materialization in bytesformat
lightdash_pre_aggregate_parquet_conversion_duration_secondshistogramDuration of JSONL to Parquet conversionstatus
lightdash_pre_aggregate_duckdb_resolution_totalcounterTotal number of DuckDB pre-aggregate resolution attemptsstatus, reason
lightdash_pre_aggregate_duckdb_resolution_duration_secondshistogramDuckDB pre-aggregate resolution durationstatus
lightdash_pre_aggregate_duckdb_query_latency_secondshistogramTotal DuckDB query latency in seconds
lightdash_pre_aggregate_duckdb_parquet_read_duration_secondshistogramTime spent in READ_PARQUET operators in seconds
lightdash_pre_aggregate_duckdb_bytes_readhistogramBytes read from S3/parquet by DuckDB queries
lightdash_pre_aggregate_duckdb_scan_amplificationhistogramRatio of rows scanned to rows returned in DuckDB queries
lightdash_pre_aggregate_fallback_totalcounterTotal number of opportunistic pre-aggregate fallbacks to warehousereason

AI agent metrics

These metrics track the performance of the AI agent:
MetricTypeDescriptionLabels
ai_agent_generate_response_duration_mshistogramAI agent generate response time in milliseconds
ai_agent_stream_response_duration_mshistogramAI agent stream response time in milliseconds
ai_agent_stream_first_chunk_mshistogramAI agent time to first chunk (any type)
ai_agent_ttft_mshistogramAI agent time to first token (TTFT)model, mode

S3 metrics

MetricTypeDescriptionLabels
lightdash_s3_results_upload_duration_secondshistogramS3 results upload durationsource

Custom event metrics

Lightdash supports operator-configurable Prometheus counter metrics that are driven by application events. These are defined via a JSON configuration file specified by the LIGHTDASH_CUSTOM_METRICS_CONFIG_PATH environment variable. Each entry in the config file creates a counter metric that increments when a matching application event fires. This allows you to track custom business-level metrics such as user logins or query executions without modifying the application code.

Using metrics for monitoring and alerting

You can use these metrics to create dashboards and alerts in your monitoring system. Some common use cases include:
  • Monitoring memory usage and setting alerts for potential memory leaks
  • Tracking PostgreSQL connection pool utilization
  • Monitoring event loop lag to detect performance issues
  • Setting up alerts for high CPU usage
For example, you might want to create alerts for:
  • High memory usage: process_resident_memory_bytes > threshold
  • Event loop lag: nodejs_eventloop_lag_p99_seconds > threshold
  • Database connection pool saturation: pg_active_connections / pg_pool_max_size > 0.8

OpenTelemetry support

Lightdash metrics are also compatible with OpenTelemetry. You can use the OpenTelemetry Collector with the Prometheus receiver to scrape Lightdash’s Prometheus metrics endpoint and export them to any OpenTelemetry-compatible backend. Example OpenTelemetry Collector configuration:
receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: 'lightdash'
          scrape_interval: 15s
          static_configs:
            - targets: ['lightdash:9090']

exporters:
  # Configure your preferred exporter (e.g., OTLP, Jaeger, etc.)
  otlp:
    endpoint: "your-otlp-endpoint:4317"

service:
  pipelines:
    metrics:
      receivers: [prometheus]
      exporters: [otlp]

Setting up a Prometheus server

If you don’t already have a Prometheus server set up, here are some resources to help you get started:

General Prometheus setup

Setting up Prometheus in Google Cloud Platform (GCP)

Setting up Prometheus in Amazon Web Services (AWS)