ICON Blockchain Node Monitoring¶
Business case idea¶
This project aims to monitor ICON Blockchain node metrics and building alarms to respond to different events so that node operators recieve notifications when certain health checks fail, thereby alerting when problems occur. This is essential if node operators should ever want to leave their desk.
Right now there is no consensus on what the best monitoring tool we should use (ie Prometheus vs TICK stack). The fellow will first have to pick a technology and determine the cheapest way to run that configuraration on a VM. Then the whole configuration needs to be made immutable such that nodes are bootstrapped with the proper configuration files. This should take little time once the fellow is oriented.
The main challenge is going to be in setting up alarms that send emails, slack notifications, and text messages when various health checks fail. The fellow will then be wise to write tests that validate the different notifications which will be the main proof that the project was a success. When the fellow reaches this point, the E2E testing project (if it is taken up), will be providing interesting simulations that will be a great demonstration of the tool in action.
prometheus / influxdb
grafana / telegraf
CI tooling for tests
Alarms triggered when health metrics fail
Immutable deployment in terraform
For the basic setup
For simple notifications
For fully tested notifications in CI
For integrations with identity management we want to restrict access to everything based on an authenticated identity and authorization provided via RBAC. This will make things harder.