Cloudbeds Senior Site Reliability Engineer

via RemoteOK

Cloudbeds

What Makes Us Unique 

At Cloudbeds, we're not just building software, we’re transforming hospitality. Our intelligently designed platform powers properties across 150 countries, processing billions in bookings annually. From independent properties to hotel groups, we help hoteliers transform operations and uplevel their commercial strategy through a unified platform that integrates with hundreds of partners. And we do it with a completely remote team. Imagine working alongside global innovators to build AI-powered solutions that solve hoteliers' biggest challenges. Since our founding in 2012, we've become the World's Best Hotel PMS Solutions Provider and landed on Deloitte's Technology Fast 500 again in 2024 – but we're just getting started. 

 

 

As a Sr. Site Reliability Engineer, you'll be the guardian of our platform's reliability and performance, ensuring millions of hospitality transactions flow seamlessly across the globe. You'll architect and implement scalable AWS cloud solutions that keep the most ambitious hotels running 24/7, while fostering a culture of automation, resilience, and continuous improvement across our engineering teams.

Our SRE Team:

We're a bottom-up, collaborative team that thrives on healthy debate and shared ownership of our infrastructure. You'll have endless opportunities to influence architecture decisions while working with cutting-edge cloud technologies at scale. We believe the best solutions come from engineers who are empowered to innovate, experiment, and challenge the status quo.

What You Bring to the Team:

  • Design and implement reliable and scalable AWS architecture to meet the needs of the organization.
  • Maintain and support highly loaded Kubernetes (EKS) clusters and infrastructure-related components.
  • Support the CICD process with ArgoCD and GitOps.
  • Automate the platform deployments with Terraform infrastructure-as-code.
  • Develop and continuously improve product Observability and Monitoring systems based on the Grafana, Prometheus, DataDog, and Cloudwatch.
  • Respond and participate with Incident Management and Root Cause Analysis, ensuring minimal impact on services.
  • Optimize system performance and troubleshoot issues as they arise.


  • Please mention the word **INSIGHTFUL** and tag RMzguNjguMTM0LjE5NA== when applying to show you read the job post completely (#RMzguNjguMTM0LjE5NA==). This is a beta feature to avoid spam applicants. Companies can search these words to find applicants that read this and see they're human.

Posted Cloudbeds Senior Site Reliability Engineer on January 31, 2026 via RemoteOK

Other remote programming jobs

Find a remote job

Don't miss out on your dream job, get the best remote jobs in your inbox every day!


Was this job helpful? Yes / No

Stay updated with the latest from RemoteHabits—get notified about important updates, remote work tips, and new job postings! RemoteHabits will help you stay ahead in your remote work journey!

Get remote work updates