Monitoring
This document covers the monitoring approach for zkIdentity, including key metrics to track, health checks, and alerting guidance. Effective monitoring ensures high availability and fast detection of issues in the verification pipeline.
Key Metrics
The following categories of metrics should be tracked for a production zkIdentity deployment:
Verification Metrics
- Total verification requests (by provider and outcome status)
- Verification success rate (rolling, per provider)
- End-to-end verification duration
- Active verification sessions
- Expired sessions
Proof Generation Metrics
- Proof generation duration (time to generate all 14 ZK proofs)
- Proof generation failures
- Retry attempts for proof generation
Provider Metrics
- Provider API latency (per provider)
- Time from session creation to webhook receipt
- Provider API errors (per provider and error type)
- Provider availability
Infrastructure Metrics
- Attestor uptime
- TEE attestation validity (is the attestor running in a genuine TEE?)
- Cartesi rollup submission latency and failures
- IPFS upload latency and failures
- Attestor wallet balance on Arbitrum
Health Check Endpoints
The attestor should expose health check endpoints for monitoring and orchestration:
Liveness Probe
A basic endpoint that returns a success response if the attestor process is running. This check does not verify external dependencies.
Readiness Probe
A more thorough endpoint that returns success only if the attestor is fully operational. It should check:
- TEE is initialized and attestation is valid
- At least one KYC provider is reachable
- Cartesi rollup endpoint is reachable
- IPFS (Pinata) is reachable
- Attestor wallet has sufficient balance
If any check fails, this endpoint should return an error response with details about which component is unhealthy.
Alerting Guidance
Critical Alerts
The following conditions warrant immediate attention:
- Attestor is down: The process is unreachable.
- TEE attestation invalid: The attestor may be in simulation mode or the TEE has failed.
- Wallet balance critically low: The attestor cannot submit rollup inputs.
- Verification success rate drops significantly: May indicate a provider issue or system problem.
Warning Alerts
The following conditions should be investigated but are not immediately critical:
- Proof generation time is elevated: May indicate resource contention.
- Provider API latency is high: May indicate provider degradation.
- Session expiry rate is elevated: Users may be timing out before completing verification.
- Wallet balance is approaching low threshold: Needs replenishment soon.
- IPFS upload failures detected: Pinata may be experiencing issues.
Log Monitoring
The attestor outputs structured logs. Key events to monitor include:
| Event | Significance |
|---|---|
| Verification completed successfully | Normal operation |
| Verification failed (provider rejection) | Normal -- user did not pass provider checks |
| Verification system error | Requires investigation |
| ZK proof generation failure | Requires investigation |
| Invalid or spoofed webhook received | Potential security issue |
| Failed rollup submission | Requires investigation |
| Wallet balance low | Needs replenishment |
Rollup State Monitoring
Periodically check the Cartesi rollup state to ensure attestations are being processed correctly. Track the total number of attestations, their status distribution, and the latest epoch.
Related Documentation
- Troubleshooting -- Diagnosing issues flagged by monitoring.
- Configuration -- Configuring log levels and monitoring endpoints.