Site reliability engineer
Grade SEO
What devops engineers do
Commonly referred to as ‘DevOps engineers’, development operations engineers support the development and operation of software through tools, environments and practices.
In this role, you will be responsible for underpinning good development processes including managing tools and testing environments, central code control, maintaining development standards and writing software that automates systems.
Role responsibilities
You'll be part of a team that shares a vision of making public services digital by default, simpler, clearer and faster to use. As a Site Reliability Engineer, you will be using the latest technologies and trends, whilst delivering working software early and often. Working as part of a multi-disciplinary team, you'll develop your skills to build a career as a Site Reliability Engineer. You will be helping to build digital services for a diverse set of users, including citizens, teachers, social workers and school professionals.
The technologies you will be using as a Site Reliability Engineer in DfE include: Docker, Linux, Git, GitHub actions, Azure, Azure DevOps, Ruby, Ruby on Rails, Powershell, Terraform, Prometheus, Grafana, ELK stack
In the Department for Education you will:
- be part of a team that runs and supports Government digital services for teachers
- help automate tasks, deployments, and tests by creating infrastructure as code
- implement resilient, highly available systems Implement modern software development practices, such as CI/CD and DevOps, as well as modern development workflows using GitHub and Azure DevOps
- work in a fully Agile environment
- use development skills to maintain applications as well as create powerful automation and monitoring scripts
- use infrastructure skills to deploy and integrate services in the cloud
- with the support of senior SREs and the wider community, learn to build secure, reliable and scalable systems, automate processes to increase delivery efficiency, assist developers to troubleshoot live systems
- share knowledge of tools and techniques with the wider team and community, both developers and non-developers
- be part of a diverse, inclusive culture across the development community, growing awareness, inclusivity, and balance
Skills you need
Essential:
- experience in software development or scripting, ideally with Ruby, Bash, Powershell or similar
- experience troubleshooting web applications
- experience with Linux or other Unix based operating systems
- experience building, troubleshooting and automating applications in public cloud based systems
- basic understanding of networking
- enthusiasm to learn and share knowledge and work collaboratively in an inclusive and diverse multi-disciplinary team environment
Desirable:
- experience using version control (ideally with Git)
- experience in analysing systems performance and configuration
- experience building, running, optimising Docker container images
- desirable criteria will only be assessed in the event of a tie break situation to make an informed decision
Technical skills:
We'll assess you against these technical skills during the selection process:
- a pragmatic approach to troubleshooting
- knowledge of Linux command line
- knowledge of public clouds
- programming logic