Problems Solved

Engineering, not just operations

Good DevOps isn't about keeping the lights on — it's about removing the recurring problems that make infrastructure hard to operate in the first place. Each entry below is a real problem I diagnosed, the approach I took, and the outcome it produced.

✎ Tip: click any text in a widget to edit it directly in your browser, then copy the updated wording back into this file. Add or remove widgets by duplicating a <div class="card problem"> block.
Autoscaling

RabbitMQ-driven autoscaling with KEDA

Problem

Workloads couldn't keep up with sudden traffic spikes, while over-provisioning to compensate wasted spend during quiet periods.

Approach

Designed queue-based autoscaling for Kubernetes workloads using KEDA and live RabbitMQ queue metrics, so capacity tracked demand in real time.

Outcome

Absorbed spikes reliably without manual intervention and removed the need for permanent over-provisioning.

KEDARabbitMQKubernetes
Cost

Karpenter cost optimisation

Problem

Static node groups left clusters paying for idle capacity and missing out on Spot savings.

Approach

Implemented dynamic node provisioning with Karpenter and shifted suitable workloads onto Spot instances with safe fallbacks.

Outcome

Cut infrastructure cost meaningfully while keeping application availability intact.

KarpenterEC2 SpotEKS
Resilience

Aurora failover automation

Problem

During Aurora failovers, load balancer targets pointed at the wrong database node, causing avoidable disruption.

Approach

Built automation to detect failovers and dynamically update network load balancer targets to the new primary.

Outcome

Improved database resilience and reduced recovery time during failover events.

AuroraNLBAutomation
Observability

Centralised observability platform

Problem

Logs and metrics were scattered across systems, making incidents slow to diagnose.

Approach

Designed a unified logging and monitoring stack with Prometheus, Grafana, Fluentd, and Elasticsearch.

Outcome

Gave teams a single place to understand system health and shortened time-to-diagnosis.

PrometheusGrafanaELK
Delivery

GitOps deployment automation

Problem

Manual deployments were error-prone and lacked a reliable audit trail.

Approach

Moved deployments to a GitOps model where the desired state lives in version control and is reconciled automatically.

Outcome

Made releases repeatable, reviewable, and easy to roll back.

GitOpsHelmCI/CD
Migration

EC2 to EKS migration

Problem

Legacy EC2 workloads were hard to scale and operationally heavy to maintain.

Approach

Containerised the workloads and migrated them onto managed EKS with appropriate scaling and health checks.

Outcome

Simplified operations and unlocked elastic scaling for the migrated services.

EKSDockerTerraform