Cloud Infrastructure Engineering

I build infrastructure
that stays out of the way.

Solo infrastructure engineer for teams who'd rather ship features than debug cluster networking. Production clusters, AI inference platforms, GitOps pipelines, hardened databases. Everything versioned, everything automated.

100+ days Continuous uptime
24/7 Always running
Multi-region Cluster topology
GPU-accelerated Inference ready
kubectl — production
% kubectl get nodes
NAME STATUS ROLES AGE
prod-pool-1-7xk2 Ready general 142d
prod-pool-2-m9z1 Ready general 142d
gpu-pool-a3f8 Ready gpu 89d
gpu-pool-b2k1 Ready gpu 89d

% flux get kustomizations
NAME READY MESSAGE AGE
infra True reconciled successfully 2m
apps True reconciled successfully 2m

%
Tech Stack

Technologies I work with daily

  1. 01

    Orchestration & Clusters

    Multi-region Kubernetes, auto-provisioning node pools, workload identity federation. Clusters designed to run reliably from day one.

    Kubernetes GKE Autopilot kubeadm Multi-cluster
  2. 02

    CI/CD & GitOps

    Every commit propagates to cluster — auditable, repeatable, fast. Automated dependency updates, zero-downtime rollouts, instant rollback if something breaks.

    GitHub Actions Flux CD Zero-downtime Auto-rollout
  3. 03

    AI/ML Infrastructure

    GPU-accelerated serving with scale-to-zero autoscaling, model routing, multi-tenant isolation, and cost tracking. From lightweight dispatch to heavy inference.

    GPU Serving vLLM Model Routing Scale-to-Zero
  4. 04

    Data & Storage

    High-availability PostgreSQL with automated backups, point-in-time recovery, and connection pooling. Time-series databases and message queues for event-driven architectures.

    PostgreSQL CNPG PgBouncer Time-series Message Queues
  5. 05

    Networking & Security

    Network segmentation, mTLS, encrypted secrets with KMS unsealing, and CIS benchmark compliance. Defense in depth applied by default.

    Cilium Gateway API mTLS Encrypted Secrets KMS
  6. 06

    Observability

    Prometheus metrics, Grafana dashboards, structured logging, and alerting pipelines that notify before you notice. Full traceability from symptom to root cause.

    Prometheus Grafana Structured Logging Alerting
  7. 07

    Infrastructure as Code

    Declarative configurations with strict separation between provisioning and application manifests. Two-repo discipline — infrastructure and apps never mix.

    Terraform Kustomize Declarative Two-repo discipline
Selected Work

Systems that are actually running

Not portfolio pieces or demos. Production systems processing real data, serving real workloads, running 24/7.

Multi-agent AI orchestration platform production

Specialist agent teams coordinated through kanban-driven task decomposition. Persistent memory, autonomous coding, research, and operations — infrastructure managed by the very agents it runs.

multi-agent kanban-driven self-operating
Real-time financial data infrastructure production

Market data feeds, whale tracking, position scanners, and trading signal pipelines. Low-latency infrastructure running continuously on dedicated clusters with high-availability databases.

hundreds of feeds 24/7 low-latency
Managed cloud infrastructure multi-region

Production clusters, high-availability databases, and GitOps deployments across multiple workloads. Terraform provisioning, automated rollouts, and continuous monitoring.

multi-cluster GitOps 100+ day uptime
Private AI inference platform production

GPU-accelerated serving with autoscaling, model routing, and cost tracking. Fully isolated environments with multi-tenant support and scale-to-zero for cost efficiency.

GPU-accelerated scale-to-zero multi-tenant
Services

What I deliver

  1. 01

    End-to-end infrastructure design and management

    Production clusters designed from scratch, provisioned with IaC, and managed with GitOps. Automated builds, zero-downtime updates, encrypted secrets, and monitoring that alerts before you notice. I operate what I build.

  2. 02

    AI/ML platform deployment and scaling

    GPU-accelerated inference platforms with autoscaling, model routing, multi-tenant isolation, and cost controls. From prototype to production — your models, your data, fully isolated.

  3. 03

    Autonomous agent system architecture

    Multi-agent systems with specialist teams, shared orchestration, persistent memory, and versioned operations. Agents that plan, research, code, review, and execute — with full traceability.

  4. 04

    CI/CD and GitOps implementation

    Automated pipelines from commit to cluster. Dependency updates, zero-downtime rollouts, instant rollback, and auditable history. No click-ops, no manual deployments.

  5. 05

    Security audits and hardening

    Network segmentation, non-root containers, encrypted secrets, mTLS, CIS benchmarks, and vulnerability scanning. Defense in depth applied systematically, not retrofitted after an incident.

Available for consulting and contracts

Whether you need a production cluster designed from scratch, an existing setup hardened for security, or an autonomous intelligence system built for your domain — reach out. I work remotely and communicate in code reviews and commit messages.

Quick reference

GitHub sirius0xdev
Availability Available for new projects
Location Remote / Async-first