Skip to main content

Monitoring

This document covers the monitoring approach for zkIdentity, including key metrics to track, health checks, and alerting guidance. Effective monitoring ensures high availability and fast detection of issues in the verification pipeline.

Key Metrics

The following categories of metrics should be tracked for a production zkIdentity deployment:

Verification Metrics

  • Total verification requests (by provider and outcome status)
  • Verification success rate (rolling, per provider)
  • End-to-end verification duration
  • Active verification sessions
  • Expired sessions

Proof Generation Metrics

  • Proof generation duration (time to generate all 14 ZK proofs)
  • Proof generation failures
  • Retry attempts for proof generation

Provider Metrics

  • Provider API latency (per provider)
  • Time from session creation to webhook receipt
  • Provider API errors (per provider and error type)
  • Provider availability

Infrastructure Metrics

  • Attestor uptime
  • TEE attestation validity (is the attestor running in a genuine TEE?)
  • Cartesi rollup submission latency and failures
  • IPFS upload latency and failures
  • Attestor wallet balance on Arbitrum

Health Check Endpoints

The attestor should expose health check endpoints for monitoring and orchestration:

Liveness Probe

A basic endpoint that returns a success response if the attestor process is running. This check does not verify external dependencies.

Readiness Probe

A more thorough endpoint that returns success only if the attestor is fully operational. It should check:

  • TEE is initialized and attestation is valid
  • At least one KYC provider is reachable
  • Cartesi rollup endpoint is reachable
  • IPFS (Pinata) is reachable
  • Attestor wallet has sufficient balance

If any check fails, this endpoint should return an error response with details about which component is unhealthy.

Alerting Guidance

Critical Alerts

The following conditions warrant immediate attention:

  • Attestor is down: The process is unreachable.
  • TEE attestation invalid: The attestor may be in simulation mode or the TEE has failed.
  • Wallet balance critically low: The attestor cannot submit rollup inputs.
  • Verification success rate drops significantly: May indicate a provider issue or system problem.

Warning Alerts

The following conditions should be investigated but are not immediately critical:

  • Proof generation time is elevated: May indicate resource contention.
  • Provider API latency is high: May indicate provider degradation.
  • Session expiry rate is elevated: Users may be timing out before completing verification.
  • Wallet balance is approaching low threshold: Needs replenishment soon.
  • IPFS upload failures detected: Pinata may be experiencing issues.

Log Monitoring

The attestor outputs structured logs. Key events to monitor include:

EventSignificance
Verification completed successfullyNormal operation
Verification failed (provider rejection)Normal -- user did not pass provider checks
Verification system errorRequires investigation
ZK proof generation failureRequires investigation
Invalid or spoofed webhook receivedPotential security issue
Failed rollup submissionRequires investigation
Wallet balance lowNeeds replenishment

Rollup State Monitoring

Periodically check the Cartesi rollup state to ensure attestations are being processed correctly. Track the total number of attestations, their status distribution, and the latest epoch.