eventlogic
multi-tenant SaaS · eu-north-1RunningSwedish multi-tenant event-management SaaS. Sole platform owner. ECS Fargate services behind ALB, Aurora PostgreSQL with schema-per-tenant, ElastiCache Redis, Amazon MQ. Tenant routing via DynamoDB registry. Customers across Europe.
- Region
- eu-north-1
- Tenancy
- schema-per-tenant
ECS Fargate · Aurora PostgreSQL · ElastiCache · Amazon MQ · DynamoDB · CloudFront
dr-failover
cross-region disaster recoveryRunningCross-region disaster recovery from eu-north-1 to eu-west-1. Built where none existed. Aurora Global Database for sub-second cross-region replication, EFS and ECR replication, shared KMS keys across regions. Documented runbook for promotion.
- Primary
- eu-north-1
- Failover
- eu-west-1
Aurora Global DB · EFS · ECR · KMS · Route 53
tenant-orch
provisioning serviceRunningPython tenant-provisioning orchestrator. One API call sets up schema-per-tenant on Aurora, wires SQS and EventBridge, creates ALB listener rules, provisions a CloudFront / S3 distribution, configures Route 53 records, and registers the tenant in DynamoDB.
- Per tenant
- one call
- Steps
- 6+
Python · FastAPI · Aurora · SQS · EventBridge · Route 53
reliability
incident · 19m outage fixRunningDiagnosed a 19-minute full-platform outage caused by blocking Redis KEYS calls exhausting the Tomcat/JDBC thread pool. Added connection-pool checkout timeouts, tuned RDS parameters, and drove a 68-task reliability program across 11 epics and 7 sprints to prevent recurrence.
- Outage
- 19 min
- Tasks
- 68 / 11 epics
Aurora · Redis · JDBC · Postmortem · SLOs
oneuptime
self-hosted SRE platformRunningSelf-hosted OneUptime on K3s in eu-central-1 (separate region from primary). Status pages, uptime monitoring, on-call scheduling, incident management. Designed so observability survives a primary-region outage.
- Region
- eu-central-1
- Surface
- status / on-call
K3s · OneUptime · OpenTelemetry · Loki
otel
observability pipelineRunningOpenTelemetry collector dual-exports metrics, logs, and traces to OneUptime and Loki at the same time. Consolidated fragmented monitoring into one observability stack. Grafana sits on top.
OpenTelemetry · Loki · Grafana · Prometheus
es-cluster
3-node ElasticsearchMaintenanceSelf-managed three-node Elasticsearch cluster managed with Terraform and Ansible. Split deploy and split-restart playbooks so a single config change cannot cascade across the cluster.
- Nodes
- 3
- Deploy
- split-restart
Elasticsearch · Terraform · Ansible
ci-cd
pipelines · 3 platformsRunningCI/CD pipelines spanning Jenkins, GitLab CI, and AWS CodePipeline / CodeBuild. Targets include ECS, Lambda, CloudFront, and EC2 deployments. App and infra share pipeline patterns.
- Platforms
- 3
- Targets
- ECS · λ · CF · EC2
Jenkins · GitLab CI · CodePipeline · CodeBuild · Docker
finops
AWS cost optimizationMaintenanceCut monthly AWS spend by removing orphaned NAT Gateways, adding S3 and DynamoDB gateway endpoints to drop data-transfer cost, setting log-retention policies on CloudWatch, and right-sizing EBS volumes.
VPC Endpoints · CloudWatch Logs · EBS · NAT
hashnode-mcp
AI assistant integrationCompletedOpen-source Model Context Protocol server that wires AI assistants like Claude into the Hashnode content API. The pattern carries into the broader agentic-DevOps work. (Note: Hashnode has since wound down public API access.)
Python · MCP · Hashnode API
github.com/sbmagar13/hashnode-mcp-servervqgan-clip
text-to-image · 2021CompletedMultimodal text-to-image generation using VQGAN + CLIP architectures in PyTorch. From the AI/ML era of the career, kept here as an artifact.
PyTorch · CLIP · VQGAN · Python
github.com/sbmagar13/VQGAN-CLIP-Text-to-Image