Hiring Managers at Bandwidth -

Apply directly to jobs in best companies

Search Companies / Jobs

Site Reliability Engineer at Bandwidth

Raleigh, United States

Job Descrption

Site Reliability Engineer (Raleigh, NC) Duties: Work closely with leadership and internal partners to ensure that software meets security, SLA, performance, and capacity requirements. Set up and maintain monitoring tools and systems to detect issues using Datadog Monitors and Alert using OpsGenie. Configure Datadog and Grafana alerts and Application Health Monitors to notify the team when anomalies or problems occur. Work closely with other Site Reliability Engineers, DevOps Engineers, and System Administrators to achieve common goals. Analyze system performance data using Snowflake to plan for capacity upgrades or optimizations. Ensure the system can handle expected growth in traffic and data using the tools by getting the Lags and behavior of the Application. Manage Kubernetes clusters and OpenShift environments for deploying and scaling containerized applications. Implement and manage infrastructure using Ansible and maintain version-controlled infrastructure code using Gitlab for consistency and repeatability. Use Terraform and Ansible scripts to define and provision infrastructure resources in a repeatable and automated manner. Create and maintain Ansible playbooks to automate routine tasks, configurations, and deployments. Use GitHub Actions for CI/CD activities to continuously build and deploy the code and implement CI/CD pipelines to streamline application updates. Build and maintain deployment pipelines using the Ansible Playbooks and ensure smooth and reliable deployments, rollback procedures, and create production releases using Service Now for Tracking the Records. Maintain detailed documentation on system architecture, configurations, and processes using Confluence and Share knowledge and best practices with team members. Plan for resource allocation using Red Hat OpenShift including servers, storage, and network capacity, following the Kubernetes Architecture to ensure the system is equipped to handle traffic spikes and growth. Develop and test disaster recovery plans to ensure data and service availability in case of major failures or disasters by creating the tools using the Go. Work closely with development teams to promote a DevOps culture and ensure reliability is built into software from the start by following best practices. Collaborate with other Site Reliability Engineers to share knowledge and solve complex problems on a weekly basis and touch base all the points. Monitor and manage cloud resource costs in AWS to optimize spending while maintaining performance.

Required: Master’s degree or foreign equivalent in Computer Science, Electrical Engineering, or related field of study plus 2 years of experience in the job offered or related position. Must have experience 2 years of experience with: Infrastructure and networking concepts including virtualization, load balancing, and DNS. At least one of the following cloud infrastructure technologies AWS, Google Cloud, Azure. REST APIs using at least one or more of the following (JSON, XML, YAML). Designing, building, and operating large-scale production systems. Continuous Integration and Continuous Deployment (CI/CD) concepts and technologies using at least one or more of following (Jenkins, GHA, Circle). Containerization technologies (Docker, Docker Compose, Docker Swarm, Kubernetes). Configuration and management techniques in large distributed environments. Monitoring and observability techniques with at least one or more of the following tools Datadog, Sensu, New Relic, Nagios. General use of open-source databases MySQL, Postgres, Redis, Cassandra. Unix/Linux administration, troubleshooting and shell scripting. At least one or more of the following programming languages Python, Java, Go, Rust, or similar. Source control (Git, GitHub) and feature branching strategies. Automating infrastructure, testing, and deployment using tools Ansible, Chef, or Terraform. Infrastructure as Code paradigm.

Or in the alternate will accept a Bachelor’s degree or foreign equivalent in Computer Science, Electrical Engineering or related field of study plus 5 years of experience in the job offered or related position. Must have experience 2 years of experience with: Infrastructure and networking concepts including virtualization, load balancing, and DNS. At least one of the following cloud infrastructure technologies AWS, Google Cloud, Azure. REST APIs using at least one or more of the following (JSON, XML, YAML). Designing, building, and operating large-scale production systems. Continuous Integration and Continuous Deployment (CI/CD) concepts and technologies using at least one or more of following (Jenkins, GHA, Circle). Containerization technologies (Docker, Docker Compose, Docker Swarm, Kubernetes). Configuration and management techniques in large distributed environments. Monitoring and observability techniques with at least one or more of the following tools Datadog, Sensu, New Relic, Nagios. General use of open-source databases MySQL, Postgres, Redis, Cassandra. Unix/Linux administration, troubleshooting and shell scripting. At least one or more of the following programming languages Python, Java, Go, Rust, or similar. Source control (Git, GitHub) and feature branching strategies. Automating infrastructure, testing, and deployment using tools Ansible, Chef, or Terraform. Infrastructure as Code paradigm.

Submit resumes to: Bandwidth, Inc, 2230 Bandmate Way, Raleigh, NC 27607, Attn: Kellie Sigmon, Sr. Manager People Services or apply at www.bandwidth.com/careers/openings/. Must reference “Site Reliability Engineer” when applying.

#LI-DNI
#LI-DNP

Complete form below to directly Send your CV / Linkedin Profile to Site Reliability Engineer at Bandwidth.

You will receive all responses from employer on this email

Example: Application for the post of 'Accountant'

Example: Introduce your self and give purpose of your application

*All fields are mandatory.

BANDWIDTH
7 jobs found

Global Customer Success Manager at Bandwidth

Raleigh, United States

Site Reliability Engineer at Bandwidth

Raleigh, United States

Software Developer II at Bandwidth

Raleigh, United States

Systems Engineer at Bandwidth

Raleigh, United States

Product Support Analyst (Messaging) at Bandwidth

Raleigh, United States

Sr. Software Development Manager at Bandwidth