Senior Site Reliability Engineer job at PesaLink
10 Days Ago
Linkedid Twitter Share on facebook
Senior Site Reliability Engineer
2025-10-15T13:24:00+00:00
PesaLink
https://cdn.greatkenyanjobs.com/jsjobsdata/data/employer/comp_9207/logo/pesa.jpg
FULL_TIME
 
Kenya
Nairobi
00100
Kenya
Financial Services
Computer & IT
KES
 
MONTH
2025-10-21T17:00:00+00:00
 
Kenya
8

The Senior SRE will be responsible for driving maturity of SRE principles such as SLA,SLO, incident report and RCA documentation, and problem management. They will use their skills to troubleshoot and resolve issues (with SLAs and to reduce MTTR) and put in place measures to prevent recurrence (thereby increasing MTBF). They will also help in ensuring that observability spans all systems and help improve and make recommendations on how we can improve our observability and monitoring posture. They will also help in API integrations, our developer portal and sandbox and generally improvement in our integration workflow

DUTIES AND RESPONSIBILITIES

  • Investigate, troubleshoot and resolve incidents, find RCA, Log analysis using a mix of observability tools and manual log analyses do reports on lessons learned.
  • Support workflows and service management consisting of closing incidents within service levels agreement (SLA), engaging 3rd line support where necessary, managing problems ensuring updated bug records and root cause analyses (RCA).
  • Assisting developers consuming various APIs (synchronous, asynchronous, REST and SOAP) , enhance our integration workflows, documentation and developer portal and sandbox.
  • Work in collaboration with other engineering/IT teams provide 24/7 support in line with ITIL and SRE principles.
  • Taking part in testing such as UATs and SITs and unit testing before rollout.
  • Planning and executing business continuity planning (BCP) and disaster recovery planning.
  • Stay up to date with industry best practices and emerging technologies in APIs, infrastructure management, and monitoring.
  • Create, develop, and maintain comprehensive documentation for payment systems, including detailed architectural diagrams, technical specifications, integration plans, user guides, and troubleshooting procedures, ensuring a clear and up-to-date resources for system implementation, integration and support.
  • Ensure the stability and performance of all platforms through ensuring that monitoring is designed in the solution. That is, it is not an afterthought, right from logging to ensuring all items are monitored.
  • Help in ensuring that observability and monitoring spans across all our systems allowing for easy correlation of fault and identification of cascading failures.
  • Take lead in complex integrations of payment systems with internal and external applications, including financial institutions, banks, and third-party payment processors.
  • System Design /Solution Architecture: Provide input in design for availability, scalability, resilience, fault tolerance and elasticity.
  • Train and mentor new engineers ensuring that they develop and grow their expertise in reliability engineering and IT governance principles such as change and incident management.
  • Evangelize reliability practices to the organization so that they are familiar with reliability engineering.
  • Support containerised workloads and troubleshoot and make recommendations on how we can improve our systems.
  • Develop automation scripts using any scripting language (e.g., Python, Bash, Ruby, or others) to streamline deployment, monitoring, and management tasks.
  • Troubleshoot and resolve API issues, security concerns, and system failures.

EDUCATION SKILLS & COMPETENCIES REQUIRED

  • Bachelor’s degree in computer science, Software Engineering, Information Technology or a related field.
  • Proficiency in API, API integrations and supports API first solutions.
  • 5+ years in troubleshooting and resolving production issues, particularly for API based systems.
  • Knowledge of ITIL and SRE principles such as change management, incident management, SLAs ,
  • SLOs, blameless postmortem and problem management.
  • Good understanding of APIs technologies such as REST/JSON, REST/XML and SOAP
 
 
 
bachelor degree
60
JOB-68efa070376ed

Vacancy title:
Senior Site Reliability Engineer

[Type: FULL_TIME, Industry: Financial Services, Category: Computer & IT]

Jobs at:
PesaLink

Deadline of this Job:
Tuesday, October 21 2025

Duty Station:
Kenya | Nairobi | Kenya

Summary
Date Posted: Wednesday, October 15 2025, Base Salary: Not Disclosed

Similar Jobs in Kenya
Learn more about PesaLink
PesaLink jobs in Kenya

JOB DETAILS:

The Senior SRE will be responsible for driving maturity of SRE principles such as SLA,SLO, incident report and RCA documentation, and problem management. They will use their skills to troubleshoot and resolve issues (with SLAs and to reduce MTTR) and put in place measures to prevent recurrence (thereby increasing MTBF). They will also help in ensuring that observability spans all systems and help improve and make recommendations on how we can improve our observability and monitoring posture. They will also help in API integrations, our developer portal and sandbox and generally improvement in our integration workflow

DUTIES AND RESPONSIBILITIES

  • Investigate, troubleshoot and resolve incidents, find RCA, Log analysis using a mix of observability tools and manual log analyses do reports on lessons learned.
  • Support workflows and service management consisting of closing incidents within service levels agreement (SLA), engaging 3rd line support where necessary, managing problems ensuring updated bug records and root cause analyses (RCA).
  • Assisting developers consuming various APIs (synchronous, asynchronous, REST and SOAP) , enhance our integration workflows, documentation and developer portal and sandbox.
  • Work in collaboration with other engineering/IT teams provide 24/7 support in line with ITIL and SRE principles.
  • Taking part in testing such as UATs and SITs and unit testing before rollout.
  • Planning and executing business continuity planning (BCP) and disaster recovery planning.
  • Stay up to date with industry best practices and emerging technologies in APIs, infrastructure management, and monitoring.
  • Create, develop, and maintain comprehensive documentation for payment systems, including detailed architectural diagrams, technical specifications, integration plans, user guides, and troubleshooting procedures, ensuring a clear and up-to-date resources for system implementation, integration and support.
  • Ensure the stability and performance of all platforms through ensuring that monitoring is designed in the solution. That is, it is not an afterthought, right from logging to ensuring all items are monitored.
  • Help in ensuring that observability and monitoring spans across all our systems allowing for easy correlation of fault and identification of cascading failures.
  • Take lead in complex integrations of payment systems with internal and external applications, including financial institutions, banks, and third-party payment processors.
  • System Design /Solution Architecture: Provide input in design for availability, scalability, resilience, fault tolerance and elasticity.
  • Train and mentor new engineers ensuring that they develop and grow their expertise in reliability engineering and IT governance principles such as change and incident management.
  • Evangelize reliability practices to the organization so that they are familiar with reliability engineering.
  • Support containerised workloads and troubleshoot and make recommendations on how we can improve our systems.
  • Develop automation scripts using any scripting language (e.g., Python, Bash, Ruby, or others) to streamline deployment, monitoring, and management tasks.
  • Troubleshoot and resolve API issues, security concerns, and system failures.

EDUCATION SKILLS & COMPETENCIES REQUIRED

  • Bachelor’s degree in computer science, Software Engineering, Information Technology or a related field.
  • Proficiency in API, API integrations and supports API first solutions.
  • 5+ years in troubleshooting and resolving production issues, particularly for API based systems.
  • Knowledge of ITIL and SRE principles such as change management, incident management, SLAs ,
  • SLOs, blameless postmortem and problem management.
  • Good understanding of APIs technologies such as REST/JSON, REST/XML and SOAP

 

Work Hours: 8

Experience in Months: 60

Level of Education: bachelor degree

Job application procedure

Please submit your CV and cover letter (in PDF format), addressed to the Hiring Manager, clearly explaining why you are the ideal candidate, to recruit@ipsl.co.ke Ensure the subject line of your email reads: Senior Site Reliability Engineer”.

All applications will be subjected to a fair and competitive recruitment process. Only shortlisted candidates will be contacted. Application Deadline: Tuesday, 21st October 2025 at 5:00 PM (EAT).

 

All Jobs | QUICK ALERT SUBSCRIPTION

Job Info
Job Category: Computer/ IT jobs in Kenya
Job Type: Full-time
Deadline of this Job: Tuesday, October 21 2025
Duty Station: Kenya | Nairobi | Kenya
Posted: 15-10-2025
No of Jobs: 1
Start Publishing: 15-10-2025
Stop Publishing (Put date of 2030): 15-10-2078
Apply Now
Notification Board

Join a Focused Community on job search to uncover both advertised and non-advertised jobs that you may not be aware of. A jobs WhatsApp Group Community can ensure that you know the opportunities happening around you and a jobs Facebook Group Community provides an opportunity to discuss with employers who need to fill urgent position. Click the links to join. You can view previously sent Email Alerts here incase you missed them and Subscribe so that you never miss out.

Caution: Never Pay Money in a Recruitment Process.

Some smart scams can trick you into paying for Psychometric Tests.