Vacancy title:
Site Reliability Engineer
Jobs at:
Africa's TalkingDeadline of this Job:
30 December 2021
Summary
Date Posted: Friday, December 17, 2021 , Base Salary: Not Disclosed
JOB DETAILS:
Site Reliability Engineer, Kenya
Position Overview
An SRE, will be responsible for building, maintaining, upgrading and managing Infrastructure software and hardware and continuously monitor and report on the status of different software layers of the stack. This includes all observability, monitoring and reporting tools.
One will be required to automate and continuously improve all processes so as to be efficient and fault tolerant. You also will also design various solutions to serve the data infrastructure teams, while ensuring the services deployed are well monitored.
Primary goal is to ensure that our technology infrastructure runs smoothly and efficiently.
Role and Responsibilities
• Serve and improve internal developer tooling, from software delivery tooling to architecting new solutions that reduce downtime and improve the customer experience.
• Prioritize and execute different Infra requests from operations, development and product teams.
• Design and run new, innovative Infra projects as they arise
• Create custom codes such as python and go that are secure in order to automate work
• Automate deployment of system configurations and security settings
• Build and maintain a CI/CD pipeline
• Collaborate with developers to make sure new environments meet requirements and conform to best practices
• Maintain services once they are live by measuring and monitoring availability, latency and overall system observability
• Learn about and gather new technologies and tools to grow the agile development environment at AT
• Manage and monitor all installed systems and infrastructure
• Proactively ensure the highest levels of systems and infrastructure availability
• Monitor and test application performance for potential bottlenecks, identify possible solutions, and work with developers to implement those fixes
• Maintain security, backup, and redundancy strategies
• Write and maintain custom scripts to increase system efficiency and lower the human intervention time on any tasks
• Participate in the design of information and operational support systems
• Provide 2nd and 3rd level support
Key Performance Indicators
• Achieve 99.99% reliability by increasing MTBF(mean time between failure), reducing MTTA(mean time to acknowledge) to less than 5 minutes and reducing MTTR(mean time to resolution). MTTR can be reduced by increasing team members' expertise and proper documentation among other things.
• Achieve adequate redundancy for both processes and data storage across the system. Where necessary, redundancy should be 100%.
• Reduce and maintain environment related latency to less than 5 ms where possible. ie. not constrained by external factors like location.
• Maintain software fast deployments at less than 1 min and improve incident reporting and recovery in case of deployment failure as described in point 1.
Experience Profile
• BS/MS degree in Computer Science, Engineering or a related subject
• Proven working experience in installing, configuring and troubleshooting UNIX /Linux based environments.
• Solid experience in the administration and performance tuning of application stacks (e.g.,Tomcat, JBoss, Apache, NGINX)
• Experience with virtualization and containerization (e.g., VMware, Virtual Box, KVM)
• Experience with monitoring systems (eg., prometheus,TICK(Telegraf, InfluxDB, chronograph/Grafana, Kapacitor), zabbix, etc)
• Experience with automation software (e.g., Puppet, cfengine, Chef)
• Solid scripting skills (e.g., shell scripts, Perl, Ruby, Python)
• Solid networking knowledge (OSI network layers, TCP/IP)
Added advantages
• 3+ years progressive experience in DevOps or as a Site Reliability Engineer (SRE)
• Solid Cloud experience, preferably in AWS
Personal Attributes
• Problem solving skills and ability to work under pressure
• Agile and resilient with good work ethic
• Outstanding communication skills, both oral and written, and both technical and non- technical.
• Able to prioritize work in a fast paced, multi tasking environment
• Able to meet tight deadlines and follow up on commitments
• Analytical mind
• Attention to details
Education Requirement: No Requirements
Job Experience: No Requirements
Work Hours: 8
Job application procedure
Click here to Apply Now
All Jobs
Join a Focused Community on job search to uncover both advertised and non-advertised jobs that you may not be aware of. A jobs WhatsApp Group Community can ensure that you know the opportunities happening around you and a jobs Facebook Group Community provides an opportunity to discuss with employers who need to fill urgent position. Click the links to join. You can view previously sent Email Alerts here incase you missed them and Subscribe so that you never miss out.