Headquarters: London, UK
URL: https://canonical.com/
Site Reliability Engineer - Work from Home in the Americas
What is Canonical?
Canonical is a growing international
software company that works with the open-source community to deliver
Ubuntu, “the world’s best free software platform”. Our mission is to
realise the potential of free software in the lives of individuals and
organisations. Our services are helping individuals and businesses
worldwide to reduce costs, improve efficiency and enhance security with
Ubuntu.
Job Summary:
The IS Team at Canonical supports and
maintains all of Canonical’s production services. IS team members use
real-life operational experiences to contribute to product improvements.
The IS team at Canonical runs the services used by over 60 million
Ubuntu users.
As an SRE you’ll be in a unique
position to improve Canonical products and the Open-source technologies
they’re based on. You’ll do this by providing critical feedback to
developers on how their products operate at scale as well as writing
code, submitting bugs, and working with other teams within the company.
You will also be encouraged to develop and submit fixes and enhancements
directly and to collaborate with development teams during design and
implementation phases.
You’ll be part of a global team of
SREs that work together and support each other to provide the best
possible services to Canonical the company, Canonical’s customers and
the Ubuntu Community.
As a Site Reliability Engineer you will:
- Understand
and operate cloud and container technology from kernel to dashboard -
OpenStack and Kubernetes both for Canonical and its clients
- Maintain operational responsibility for all of Canonical’s core services, networks, and infrastructure
- Develop new features and improve the resilience and scalability of the existing cloud and container portfolio at Canonical
- Automate
operations for reuse across the worlds largest companies, taking into
consideration the complexities of distributed systems
- Develop skills in troubleshooting, capacity planning, and performance analysis
- Collaborate with development teams to design service architecture, documentation, playbooks, policies and operational procedures
- Provide assistance and collaborate with globally distributed engineering, operations, and support peers.
- Be given uninterrupted software development time to collaborate on larger coding projects and automate manual tasks
- Carry final responsibility for time-critical escalations
The successful Site Reliability Engineer candidate will have:
- Engineering degree, preferably in computer science or software engineering
- Python software development experience, with large projects
- Strong modern engineering background (peer-review, unit testing, SCM, CI/CD, Agile)
- Preference for treating configuration as code and automating to reliably solve problems.
- Extensive Knowledge of cloud computing concepts and technologies
- Practical knowledge of Linux networking, routing, and firewalls
- Hands-on experience administering Linux servers for personal or professional use
- Able to communicate clearly and effectively in English over email, IRC, video or voice calls and in person
- Self-driven, able to troubleshoot from kernel to web, and willing ask others when appropriate
- A willingness to be flexible and able to learn new things quickly.
- The ability to work under pressure and solve difficult problems.
- The ability to be productive and organized and capable of working from home full time
- Familiarity with Ubuntu or Debian
To apply: https://boards.greenhouse.io/canonical/jobs/1747487