Currently, our LogicMonitor automatisms operate using GitLab runners housed within a single virtual machine (VM) containing two Docker containers. This setup is prone to failure, as any issues affecting the VM would render all runners unavailable.
With the deployment of the new LogicMonitor RBAC, the reliability of GitLab runners has become even more critical. A failure in the VM would now significantly impact the LogicMonitor team's operations.
To address this, I have been investigating and developing code to deploy GitLab runners using terraform within a Kubernetes environment in our CAAS infrastructure. Transitioning to this new solution will enhance fault tolerance and improve the management of concurrent jobs, thereby increasing overall system stability and efficiency.
Additionally, utilizing Terraform for deployment enables the creation of a stateful infrastructure. This approach allows for seamless redeployment, reconfiguration, and destruction of resources at any time. It also facilitates easy scaling of the infrastructure as needed.
GitLab runners have been deployed within a Kubernetes environment in our CAAS infrastructure using terraform, these new runners replace those that were running under a VM enhancing in this way the fault tolerance capabilities of these components and facilitating the easy scaling if it is required.