Site Reliability Engineer (Engineering Support) at BlueLabs

BlueLabs

DevOps / Sysadmin | Contract

Apply now

Job details

BlueLabs is a dynamic and fast-growing startup operating in the sports betting industry. We are committed to delivering innovative and cutting-edge solutions to our customers, providing an unparalleled betting experience. As we continue to expand to new markets and evolve our product, we are looking for a highly skilled and passionate Site Reliability Engineer to join our brand new Engineering Support team.

Our Technology Stack

Our technology stack includes over 30 microservices written in Go, Typescript and Scala, implementing an event-driven architecture that has allowed us to efficiently and timely extend and scale our product to meet the demands of our customers. We leverage Apache Pulsar for event-driven messaging, Kubernetes for container orchestration, Google Cloud Platform (GCP) for cloud infrastructure, PostgreSQL for data storage and Cloudflare as our network perimeter and for our edge-computing needs.

Our release process is fully automated, enabling our small engineering team to perform several deployments per day. A typical deployment takes only a few seconds to complete. This automation spans from the provisioning of infrastructure, to the deployment of applications and their configuration, as well as the configuration of monitoring dashboards and alerts. The tools we use include, but are not not limited to: Terraform, Helmfile and GitHub Actions.

We use Grafana, Loki and Mimir to monitor the performance of our platform, enabling us to automatically detect and escalate any potential issues to the affected teams.

As a technology company, we are dedicated to maintaining a modern and agile technology environment that empowers our engineers to thrive and build the future of sports betting technology.

About the Role

We are looking for a Site Reliability Engineer with a strong technical background in Software Engineering to help us bootstrap our brand new Engineering Support team. In this role, you will participate in the 24/7 on-call rotation to help us troubleshoot any live incidents we may face, while also directly contributing to a few key projects aimed at improving the development and operational experience, scalability and reliability of our product.

Responsibilities

Proactive monitoring of all our systems and services.
Participation in incident response 24/7 on-call rotation.
Documentation of incident postmortems.
Efficient and timely incident communication, including external communication.
Implementation of preventive measures identified in previous incidents, including occasional contributions to our services.
Promote a culture of transparency and collaboration with development and infrastructure teams to raise awareness and address known reliability, availability and performance bottlenecks.
Communication with external service providers.
Definition, proactive collaboration with other teams and execution of a wide range of projects aimed at improving the development and operational experience of our systems and services.
Definition, proactive collaboration with other teams and execution of a wide range of projects aimed at improving the performance and reliability of our systems and services.

Compensation

The compensation range for this role is €70,000 - €110,000 annually, depending on your skills, experience and form of employment (employment or independent contractor). Additional perks include a new MacBook 16" Pro or Linux laptop, and 40 days of paid annual leave (including public holidays). For more details, please refer to our Recruitment FAQs.

5+ years of experience in Software Engineering, working with relevant technologies.
Proven experience developing software solutions using Go and Typescript, as well as working with event-driven architectures and distributed systems.
Proven experience participating in on-call rotations and troubleshooting live production issues.
Experience working with PostgreSQL and developing scalable and performant data solutions.
Experience working with Kubernetes and Google Cloud Platform.
Experience automating workflows using tools like Terraform.
Familiarity with concepts such as rate limiting, caching and cloud networking.
Knowledge of software architecture and design principles.
Excellent problem-solving skills and the ability to work independently and as part of a team.
Strong communication skills and the ability to mentor and lead junior engineers.
Passion for technology and willingness to learn and adapt to new technology trends.

Apply now

BlueLabs

Updated 2024-03-10T22:51:08.000Z

Apply now

Site Reliability Engineer (Engineering Support) at BlueLabs

Job details

Share with friends