In the 2024 priorities, within the Key Results for Product (Delivery Excellence), the following is established:
Q2: Establish OLA between FD & BD (Completed)
Q3: Establish SLAs for business-critical solutions and track uptime
We currently have a dashboard in Splunk that allows us to track the uptime of operating system infrastructure components.
Objective:
To evolve the existing dashboard to provide an end-to-end availability view that includes:
- Additional components such as middleware and databases
- Specific availability metrics for each application
Important Considerations
- Availability should reflect the actual service provided to the end user
- In high-availability configurations (such as clusters), the failure of one node does not necessarily impact the application's availability
We need your collaboration to:
- Identify additional components to monitor
- Define availability criteria for each application
- Establish mechanisms to measure and log times when the application is not providing service to the end user
- Integrate this information into our existing dashboard
- Check posibilities to provide the information to SNOW
Let's meet BC team and try to have a more wholistic view of their needs, not particularly about this idea, so we can design together an strategy that might help to have reliable and future ready consumption of the Observability platorms available. After that call, we can try to re-shapte this into a more finetuned feature/s for the future PIs.
Create a session with the customer to better understand the big pictures as there might be different options to move forward.