SENIOR DEVOPS/ SITE RELIABILITY ENGINEER
Bestarion is a subsidiary of Larion, a well-established software outsourcing company in Vietnam with decades of experience delivering high-quality technology solutions. Inheriting Larion’s strong foundation and technical expertise, Bestarion continues to grow as a trusted partner for clients worldwide.
For over 15 years, Bestarion has provided innovative outsourcing services and business solutions to successful clients in more than 15 countries. Our diverse range of services includes Big Data & Data Analytics, Securities Trading Solutions, Surround Core Banking Solutions, E-commerce and Social Network App Development, and Web Application Development. We focus on today’s emerging trends such as Big Data, Cloud Computing, Social Networks, Mobility, and the Internet of Things.
- Location: QTSC Building, 3rd Floor, 1 Quang Trung, Software City, HCMC
- Working Location: Remote or working in Bestarion office/ US onsite opportunity
- Working Time:
- Monday – Friday, 8:00 AM – 5:30 PM (Flexible depending on each project)
- 1-hour daily standup Tuesday-Friday, likely from 9 PM to 10 PM VNT.
- Expectation to Travel to USA: The expectation is 1 – 4 trips/year, with each trip lasting 1-2 weeks.
- Maintenance Work Hours: The resource will need to work USA hours for three days every three months to perform maintenance on key production systems.
- About the project: We’re looking for a skilled and motivated DevOps/Site Reliability Engineer (SRE) to join our growing team. In this exciting role, you will be responsible for building and maintaining our cloud infrastructure, automating our CI/CD pipelines, and ensuring the reliability, performance, and scalability of our services. The ideal candidate will have a strong background in both software development and systems engineering, with a focus on GCP and automation tools, and a strong sense of ownership.
JOB DESCRIPTIONS
- Design and manage infrastructure on Google Cloud Platform (GCP) using Terraform for Infrastructure as Code (IaC).
- Build, configure, and maintain CI/CD pipelines using Jenkins and Groovy scripts to automate software delivery from code commit to production deployment.
- Manage Jenkins plugins, master/agent nodes, and pipeline libraries to ensure the stability and scalability of our CI/CD platform.
- Troubleshoot and debug automation code and interconnected systems to quickly identify and resolve issues, ensuring minimal disruption to services.
- Manage core GCP services including Compute Engine, Managed Instance Groups (MIG), Disk Snapshots, Storage, and Artifact Registry to support our application ecosystem.
- Containerize applications using Docker to ensure consistency across development, testing, and production environments.
- Implement and manage infrastructure as code, monitoring, and logging solutions to ensure high availability and performance of our systems.
- Collaborate with development teams to improve the entire software development lifecycle, from code to production.
- Develop and maintain workflows in Airflow to orchestrate complex data and application tasks.
- Troubleshoot and resolve production incidents, participate in on-call rotation, perform root cause analysis and perform key maintenance activities quarterly.
- Effectively communicate complex technical concepts to both technical and non-technical stakeholders through clear written and verbal communication.
- Strong expertise in managing and repaving Windows and Linux machines, ensuring security compliance through automated processes.
- Skilled in implementing security compliance measures, including repaving infrastructure, key rotation, and periodic updates to meet industry standards.
- Strong knowledge of monitoring and alerting systems, including Prometheus, Cloud Monitoring, and PagerDuty, to ensure system reliability and proactive incident response.
JOB QUALIFICATIONS
- Bachelor’s degree in Computer Science, Information Technology, or a related field.
- Have over 5+ years of experience as a DevOps Engineer, SRE, or a similar role.
- Excellent verbal and written English communication skills are essential. You must be able to clearly document processes, write concise reports, and articulate technical issues to various audiences.
- Strong proficiency with Terraform for managing cloud resources.
- Hands-on experience with Jenkins, including managing Jenkins masters and agents, and writing Groovy scripts for pipeline automation.
- Proven ability to troubleshoot and resolve issues in complex, interconnected systems quickly and efficiently.
- Expertise in GCP services, including Compute Engine, MIG, Disk Snapshots, Storage, and Artifact Registry.
- Solid experience with Docker and containerization principles.
- Familiarity with Airflow for workflow management and orchestration.
- Strong understanding of Linux/Unix systems, networking, and security principles.
- A proactive, “can-do” attitude with a strong sense of ownership and a desire to take on new challenges.
- Excellent problem-solving skills and a collaborative, team-oriented mindset.
- Maintenance Work Hours: The resource will need to work USA hours for three days every three months to perform maintenance on key production systems.
DEFINE YOURSELF AT BESTARION WITH ATTRACTIVE BENEFITS
- Performance appraisal twice a year.
- Attractive benefits (13th salary, distinguished employee of the quarter and year,
seniority award…) - 12 days off
- Lunch and parking allowance
- Healthcare and accident insurance
- Annual health check
- Working devices provided: Laptop and screen (If needed)
- Team Building activities in every summer, company trip, big annual year-end party every year, etc
Fitness & sports activities: football, tennis, table tennis, badminton… - Commitment to community development: charity every quarter, blood donation, public seminars, career orientation talks…
- Support for personal loans such as home loans, vehicle loans, tuition fees