Connecting Talent to opportunity

Connecting Talent to opportunity

Site Reliability Engineer

Expired

Job Description:

Site Reliability Engineer – AWS, Monitoring tools, PaaS, terraform, Grafana, Redshift, remote

 

This is a permanent position to work with an existing team of passionate full-stack developers. As the Senior Site Reliability Engineer (SRE) you will be expected to provide leadership around reliability, performance, scalability as well as observability whilst working with an experienced and talented team. You will be developing your skill levels and working with the latest technologies as you progress with the company.

 

You will be working in a start-up culture using a Lean/Kanban approach and Continuous Delivery. They have a distributed microservices architecture utilising multiple technology stacks (it includes Ruby, Elixir, Node.js and Scala).

 

Responsibilities

  • Observability of applications, using monitoring, logging, tracing, and alerting solutions
  • Driving operational quality through common SRE best practices
  • Infrastructure as code
  • Cloud engineering, including building fault tolerant systems
  • Disaster recovery, backups, security
  • Incident response

Skills

 

  • 4+ years experience as an SRE
  • Ideally experienced in at least one programming language (like Ruby or Java or Python)
  • Experienced with Infrastructure as Code practices using tools like Terraform.
  • Experience with observability and monitoring tools like Kibana, Elastic search, Prometheus, Grafana on a PaaS
  • Strong knowledge of operating databases at scale (relational and NoSQL)
  • Experience supporting high volume, high transaction websites.
  • Strong problem-solving skills with an emphasis on reliability and performance.
  • Experience using cloud services, both IAAS (e.g. AWS) and PAAS (e.g. Heroku)
  • Experience of supporting queues and messaging systems (RabbitMQ, Kafka, etc.)

 

This is a remote position with occasional visits into the office for meet ups, workshops, socials etc. You will be the sole SRE in the company at this current time, with a view of building a team around you after approx. the first 6 months once you have begun to plan, identify shortfalls, DR, incident responsive and what tooling can be used to improve the business.

 

  • Job Type

    Permanent, Full Time

  • Work Authorisation

    United Kingdom

  • Industry Sector IT & Internet