Site Reliability Engineer job at Africa's Talking
1349 Days Ago
Linkedid Twitter Share on facebook

Vacancy title:
Site Reliability Engineer

[ Type: FULL TIME , Industry: Nonprofit, and NGO , Category: Science & Engineering ]

Jobs at:

Africa's Talking

Deadline of this Job:
30 December 2021  

Duty Station:
Within Kenya , Nairobi , East Africa

Summary
Date Posted: Friday, December 17, 2021 , Base Salary: Not Disclosed


JOB DETAILS:
Site Reliability Engineer, Kenya

Position Overview

An SRE, will be responsible for building, maintaining, upgrading and managing Infrastructure software and hardware and continuously monitor and report on the status of different software layers of the stack. This includes all observability, monitoring and reporting tools.
One will be required to automate and continuously improve all processes so as to be efficient and fault tolerant. You also will also design various solutions to serve the data infrastructure teams, while ensuring the services deployed are well monitored.
Primary goal is to ensure that our technology infrastructure runs smoothly and efficiently.

Role and Responsibilities
• Serve and improve internal developer tooling, from software delivery tooling to architecting new solutions that reduce downtime and improve the customer experience.
• Prioritize and execute different Infra requests from operations, development and product teams.
• Design and run new, innovative Infra projects as they arise
• Create custom codes such as python and go that are secure in order to automate work
• Automate deployment of system configurations and security settings
• Build and maintain a CI/CD pipeline
• Collaborate with developers to make sure new environments meet requirements and conform to best practices
• Maintain services once they are live by measuring and monitoring availability, latency and overall system observability
• Learn about and gather new technologies and tools to grow the agile development environment at AT
• Manage and monitor all installed systems and infrastructure
• Proactively ensure the highest levels of systems and infrastructure availability
• Monitor and test application performance for potential bottlenecks, identify possible solutions, and work with developers to implement those fixes
• Maintain security, backup, and redundancy strategies
• Write and maintain custom scripts to increase system efficiency and lower the human intervention time on any tasks
• Participate in the design of information and operational support systems
• Provide 2nd and 3rd level support

Key Performance Indicators

• Achieve 99.99% reliability by increasing MTBF(mean time between failure), reducing MTTA(mean time to acknowledge) to less than 5 minutes and reducing MTTR(mean time to resolution). MTTR can be reduced by increasing team members' expertise and proper documentation among other things.
• Achieve adequate redundancy for both processes and data storage across the system. Where necessary, redundancy should be 100%.
• Reduce and maintain environment related latency to less than 5 ms where possible. ie. not constrained by external factors like location.
• Maintain software fast deployments at less than 1 min and improve incident reporting and recovery in case of deployment failure as described in point 1.

Experience Profile
• BS/MS degree in Computer Science, Engineering or a related subject
• Proven working experience in installing, configuring and troubleshooting UNIX /Linux based environments.
• Solid experience in the administration and performance tuning of application stacks (e.g.,Tomcat, JBoss, Apache, NGINX)
• Experience with virtualization and containerization (e.g., VMware, Virtual Box, KVM)
• Experience with monitoring systems (eg., prometheus,TICK(Telegraf, InfluxDB, chronograph/Grafana, Kapacitor), zabbix, etc)
• Experience with automation software (e.g., Puppet, cfengine, Chef)
• Solid scripting skills (e.g., shell scripts, Perl, Ruby, Python)
• Solid networking knowledge (OSI network layers, TCP/IP)

Added advantages
• 3+ years progressive experience in DevOps or as a Site Reliability Engineer (SRE)
• Solid Cloud experience, preferably in AWS

Personal Attributes
• Problem solving skills and ability to work under pressure
• Agile and resilient with good work ethic
• Outstanding communication skills, both oral and written, and both technical and non- technical.
• Able to prioritize work in a fast paced, multi tasking environment
• Able to meet tight deadlines and follow up on commitments
• Analytical mind
• Attention to details

Education Requirement: No Requirements

Job Experience: No Requirements

Work Hours: 8

Job application procedure
Click here to Apply Now

All Jobs

QUICK ALERT SUBSCRIPTION

Job Info
Job Category: Engineering jobs in Kenya
Job Type: Full-time
Deadline of this Job: 30 December 2021
Duty Station: Nairobi
Posted: 17-12-2021
No of Jobs: 1
Start Publishing: 17-12-2021
Stop Publishing (Put date of 2030): 17-12-2065
Apply Now
Notification Board

Join a Focused Community on job search to uncover both advertised and non-advertised jobs that you may not be aware of. A jobs WhatsApp Group Community can ensure that you know the opportunities happening around you and a jobs Facebook Group Community provides an opportunity to discuss with employers who need to fill urgent position. Click the links to join. You can view previously sent Email Alerts here incase you missed them and Subscribe so that you never miss out.

Caution: Never Pay Money in a Recruitment Process.

Some smart scams can trick you into paying for Psychometric Tests.