Apply directly to jobs in best companies
Search Companies / Jobs
 

Site Reliability Engineer at Bandwidth
Raleigh, United States


Job Descrption

Site Reliability Engineer (Raleigh, NC) Duties: Work closely with leadership and internal partners to ensure that software meets security, SLA, performance, and capacity requirements. Set up and maintain monitoring tools and systems to detect issues using Datadog Monitors and Alert using OpsGenie. Configure Datadog and Grafana alerts and Application Health Monitors to notify the team when anomalies or problems occur. Work closely with other Site Reliability Engineers, DevOps Engineers, and System Administrators to achieve common goals. Analyze system performance data using Snowflake to plan for capacity upgrades or optimizations. Ensure the system can handle expected growth in traffic and data using the tools by getting the Lags and behavior of the Application. Manage Kubernetes clusters and OpenShift environments for deploying and scaling containerized applications. Implement and manage infrastructure using Ansible and maintain version-controlled infrastructure code using Gitlab for consistency and repeatability. Use Terraform and Ansible scripts to define and provision infrastructure resources in a repeatable and automated manner. Create and maintain Ansible playbooks to automate routine tasks, configurations, and deployments. Use GitHub Actions for CI/CD activities to continuously build and deploy the code and implement CI/CD pipelines to streamline application updates. Build and maintain deployment pipelines using the Ansible Playbooks and ensure smooth and reliable deployments, rollback procedures, and create production releases using Service Now for Tracking the Records. Maintain detailed documentation on system architecture, configurations, and processes using Confluence and Share knowledge and best practices with team members. Plan for resource allocation using Red Hat OpenShift including servers, storage, and network capacity, following the Kubernetes Architecture to ensure the system is equipped to handle traffic spikes and growth. Develop and test disaster recovery plans to ensure data and service availability in case of major failures or disasters by creating the tools using the Go. Work closely with development teams to promote a DevOps culture and ensure reliability is built into software from the start by following best practices. Collaborate with other Site Reliability Engineers to share knowledge and solve complex problems on a weekly basis and touch base all the points. Monitor and manage cloud resource costs in AWS to optimize spending while maintaining performance.

 

Required: Master’s degree or foreign equivalent in Computer Science, Electrical Engineering, or related field of study plus 2 years of experience in the job offered or related position. Must have experience 2 years of experience with: Infrastructure and networking concepts including virtualization, load balancing, and DNS. At least one of the following cloud infrastructure technologies AWS, Google Cloud, Azure. REST APIs using at least one or more of the following (JSON, XML, YAML). Designing, building, and operating large-scale production systems. Continuous Integration and Continuous Deployment (CI/CD) concepts and technologies using at least one or more of following (Jenkins, GHA, Circle). Containerization technologies (Docker, Docker Compose, Docker Swarm, Kubernetes). Configuration and management techniques in large distributed environments. Monitoring and observability techniques with at least one or more of the following tools Datadog, Sensu, New Relic, Nagios. General use of open-source databases MySQL, Postgres, Redis, Cassandra. Unix/Linux administration, troubleshooting and shell scripting. At least one or more of the following programming languages Python, Java, Go, Rust, or similar. Source control (Git, GitHub) and feature branching strategies. Automating infrastructure, testing, and deployment using tools Ansible, Chef, or Terraform. Infrastructure as Code paradigm.

 

Or in the alternate will accept a Bachelor’s degree or foreign equivalent in Computer Science, Electrical Engineering or related field of study plus 5 years of experience in the job offered or related position. Must have experience 2 years of experience with: Infrastructure and networking concepts including virtualization, load balancing, and DNS. At least one of the following cloud infrastructure technologies AWS, Google Cloud, Azure. REST APIs using at least one or more of the following (JSON, XML, YAML). Designing, building, and operating large-scale production systems. Continuous Integration and Continuous Deployment (CI/CD) concepts and technologies using at least one or more of following (Jenkins, GHA, Circle). Containerization technologies (Docker, Docker Compose, Docker Swarm, Kubernetes). Configuration and management techniques in large distributed environments. Monitoring and observability techniques with at least one or more of the following tools Datadog, Sensu, New Relic, Nagios. General use of open-source databases MySQL, Postgres, Redis, Cassandra. Unix/Linux administration, troubleshooting and shell scripting. At least one or more of the following programming languages Python, Java, Go, Rust, or similar. Source control (Git, GitHub) and feature branching strategies. Automating infrastructure, testing, and deployment using tools Ansible, Chef, or Terraform. Infrastructure as Code paradigm.

 

Submit resumes to: Bandwidth, Inc, 2230 Bandmate Way, Raleigh, NC 27607, Attn: Kellie Sigmon, Sr. Manager People Services or apply at www.bandwidth.com/careers/openings/. Must reference “Site Reliability Engineer” when applying.

 

 

 

 

#LI-DNI
#LI-DNP


Complete form below to directly Send your CV / Linkedin Profile to Site Reliability Engineer at Bandwidth.
@
You will receive all responses from employer on this email
Example: Application for the post of 'Accountant'
Example: Introduce your self and give purpose of your application
*All fields are mandatory.
BANDWIDTH
7 jobs found
Global Customer Success Manager at Bandwidth
Raleigh, United States
Site Reliability Engineer at Bandwidth
Raleigh, United States
Software Developer II at Bandwidth
Raleigh, United States
Systems Engineer at Bandwidth
Raleigh, United States
Product Support Analyst (Messaging) at Bandwidth
Raleigh, United States
Sr. Software Development Manager at Bandwidth
Raleigh, United States
Director, VoIP Engineering at Bandwidth
Raleigh, United States
1
803 Other Software Development Companies Worldwide
ether.fi  
Software Development
Dubai, United Arab Emirates
3 hiring managers available
2 employees work here
Sana Commerce  
Software Development
Dubai, United Arab Emirates
97 hiring managers available
501 employees work here
IDEMIA  
Software Development
Dubai, United Arab Emirates
28 hiring managers available
10,001 employees work here
Builder.ai  
Software Development
Dubai, United Arab Emirates
23 hiring managers available
501 employees work here
Meta  
Software Development
Dubai, United Arab Emirates
1486 hiring managers available
10,001 employees work here
Google  
Software Development
Dubai, United Arab Emirates
743 hiring managers available
10,001 employees work here
Microsoft  
Software Development
Dubai, United Arab Emirates
866 hiring managers available
10,001 employees work here
Justlife  
Software Development
Dubai, United Arab Emirates
11 hiring managers available
51 employees work here
Bosch Group  
Software Development
Dubai, United Arab Emirates
174 hiring managers available
10,001 employees work here
Amazon.com  
Software Development
Dubai, United Arab Emirates
481 hiring managers available
10,001 employees work here