Head – DR Orchestration and Testing
2025-10-16T15:09:00+00:00
Equity Bank
https://cdn.greatkenyanjobs.com/jsjobsdata/data/employer/comp_7833/logo/Equity%20Bank.png
https://equitygroupholdings.com/ke/
FULL_TIME
Kenya
Nairobi
00100
Kenya
Banking
Computer & IT
2025-10-30T17:00:00+00:00
Kenya
8
A senior leadership role accountable for an enterprise-wide DR orchestration and testing program spanning data centers, cloud, networks, applications, data platforms, and third-party services. The role builds automated runbooks, governs recovery scenarios, executes end-to-end exercises (tabletop to full failover), and drives remediation. It tightly integrates with Change/Release, Backup & Recovery, Cybersecurity, SRE/Operations, and Business Units to assure recoverability for core banking, payments, digital channels, and shared services across all subsidiaries.
Key Accountabilities
- Strategy, Policy & Governance
- Define and maintain the Group DR Orchestration & Testing Policy, Standards, and Playbooks aligned to ITIL v4, ISO 22301/27031, and NIST SP 800-34.
- Institutionalize governance anchored by the IT Steering Committee and the Service Continuity/DR Working Group as the mechanisms for cadence, accountability, and reporting.
- Establish decision rights, RACI, and acceptance criteria for “go-live” recoverability (RTO/RPO, data integrity, service dependencies).
- Embed DR impact assessment in Change, Release, and Architecture review gates.
- Orchestration & Automation
- Design and implement automated recovery runbooks (e.g., infra, platform, DB, app, network/DNS, identity) leveraging workflow/orchestration tools, Infrastructure-as-Code, and CI/CD.
- Engineer repeatable failover/failback patterns (active–active, active–standby, zonal/region/site) for on‑prem, hybrid, and cloud workloads.
- Integrate observability (APM, logs, synthetics) to validate service health during exercises and real events.
- Testing Program Management
- Own the Group DR Test Calendar (annual/quarterly/monthly) covering tabletop, technical component tests, integrated service tests, and full-scale exercises.
- Define test scenarios based on BIAs, risk scenarios (e.g., ransomware, DC outage, carrier failure, major release rollback), and regulatory expectations.
- Measure and certify recoverability per service; track defects, action owners, and closure SLAs.
- Data, Backup & Cyber Recovery Assurance
- Align backup/restore testing with application-level recovery (including immutable/air-gapped copies, vaulting, and key management).
- Validate data integrity, transaction reconciliation, and journal consistency post-recovery (e.g., core banking, card switch, channels).
- Coordinate with Cybersecurity on ransomware readiness, clean‑room recovery, and malware‑free restore procedures.
- Third-Party & Cloud Resilience
- Assess and test DR commitments of critical vendors/fintech partners; verify evidence of recoverability and exit/failover options.
- Govern SaaS and cloud region/zone strategies, data residency constraints, and cross‑border implications for subsidiaries.
- Service Mapping & Readiness
- Maintain service dependency maps (CMDB) linking business services to applications, platforms, data stores, integrations, and infrastructure.
- Define minimal viable service (MVS) configurations for recovery and ensure runbooks reflect current state.
- Metrics, Reporting & Continuous Improvement
- Define and report KPIs/KRIs: test coverage %, pass rate, RTO/RPO adherence, MTTR
- (exercises/incidents), % automated runbooks, restore success rate, findings aging, and resilience confidence score.
- Produce executive dashboards and Monthly/Quarterly Resilience Reports to Group CIO, CFO, Risk, and Executive Committees.
- Run post-exercise/post-incident reviews and drive structural fixes (automation, design changes, capacity).
- Subsidiary Coordination & Incident Readiness
- Coordinate DR readiness across Banking, Insurance, Fintech, Health, and Foundation; tailor scenarios to local contexts while enforcing Group standards.
- Lead or support technical recovery command during major incidents and planned DR events.
- Financial Planning & Value Optimization
- Quantify cost‑to‑recover vs. risk; recommend right‑sized patterns (active–active vs. warm/cold) by criticality.
- Support budgeting for resilience tooling, testing, and automation; demonstrate ROI through reduced downtime and faster recovery.
Key Deliverables
- Group DR Orchestration & Testing Policy, Standards, and Runbook Library.
- Annual DR Test Calendar with scenario catalog and success criteria.
- Service-level Recovery Certificates (per critical service) and remediation tracker.
- Enterprise Resilience Dashboard (RTO/RPO, coverage, pass rate, MTTR, confidence score).
- Quarterly Executive Resilience Reports and Board-ready summaries.
- Post-Exercise/Incident Review reports with prioritized corrective actions.
- Up-to-date Service Dependency Maps and MVS definitions.
Required Qualifications & Experience
Education
- Bachelor’s in Computer Science, Engineering, Information Systems, or related field.
- Master’s in IT Management, Business Continuity/Resilience, or Operations is an advantage.
- Certifications (Preferred)
JOB-68f10a8c76a5c
Vacancy title:
Head – DR Orchestration and Testing
[Type: FULL_TIME, Industry: Banking, Category: Computer & IT]
Jobs at:
Equity Bank
Deadline of this Job:
Thursday, October 30 2025
Duty Station:
Kenya | Nairobi | Kenya
Summary
Date Posted: Thursday, October 16 2025, Base Salary: Not Disclosed
Similar Jobs in Kenya
Learn more about Equity Bank
Equity Bank jobs in Kenya
JOB DETAILS:
A senior leadership role accountable for an enterprise-wide DR orchestration and testing program spanning data centers, cloud, networks, applications, data platforms, and third-party services. The role builds automated runbooks, governs recovery scenarios, executes end-to-end exercises (tabletop to full failover), and drives remediation. It tightly integrates with Change/Release, Backup & Recovery, Cybersecurity, SRE/Operations, and Business Units to assure recoverability for core banking, payments, digital channels, and shared services across all subsidiaries.
Key Accountabilities
- Strategy, Policy & Governance
- Define and maintain the Group DR Orchestration & Testing Policy, Standards, and Playbooks aligned to ITIL v4, ISO 22301/27031, and NIST SP 800-34.
- Institutionalize governance anchored by the IT Steering Committee and the Service Continuity/DR Working Group as the mechanisms for cadence, accountability, and reporting.
- Establish decision rights, RACI, and acceptance criteria for “go-live” recoverability (RTO/RPO, data integrity, service dependencies).
- Embed DR impact assessment in Change, Release, and Architecture review gates.
- Orchestration & Automation
- Design and implement automated recovery runbooks (e.g., infra, platform, DB, app, network/DNS, identity) leveraging workflow/orchestration tools, Infrastructure-as-Code, and CI/CD.
- Engineer repeatable failover/failback patterns (active–active, active–standby, zonal/region/site) for on‑prem, hybrid, and cloud workloads.
- Integrate observability (APM, logs, synthetics) to validate service health during exercises and real events.
- Testing Program Management
- Own the Group DR Test Calendar (annual/quarterly/monthly) covering tabletop, technical component tests, integrated service tests, and full-scale exercises.
- Define test scenarios based on BIAs, risk scenarios (e.g., ransomware, DC outage, carrier failure, major release rollback), and regulatory expectations.
- Measure and certify recoverability per service; track defects, action owners, and closure SLAs.
- Data, Backup & Cyber Recovery Assurance
- Align backup/restore testing with application-level recovery (including immutable/air-gapped copies, vaulting, and key management).
- Validate data integrity, transaction reconciliation, and journal consistency post-recovery (e.g., core banking, card switch, channels).
- Coordinate with Cybersecurity on ransomware readiness, clean‑room recovery, and malware‑free restore procedures.
- Third-Party & Cloud Resilience
- Assess and test DR commitments of critical vendors/fintech partners; verify evidence of recoverability and exit/failover options.
- Govern SaaS and cloud region/zone strategies, data residency constraints, and cross‑border implications for subsidiaries.
- Service Mapping & Readiness
- Maintain service dependency maps (CMDB) linking business services to applications, platforms, data stores, integrations, and infrastructure.
- Define minimal viable service (MVS) configurations for recovery and ensure runbooks reflect current state.
- Metrics, Reporting & Continuous Improvement
- Define and report KPIs/KRIs: test coverage %, pass rate, RTO/RPO adherence, MTTR
- (exercises/incidents), % automated runbooks, restore success rate, findings aging, and resilience confidence score.
- Produce executive dashboards and Monthly/Quarterly Resilience Reports to Group CIO, CFO, Risk, and Executive Committees.
- Run post-exercise/post-incident reviews and drive structural fixes (automation, design changes, capacity).
- Subsidiary Coordination & Incident Readiness
- Coordinate DR readiness across Banking, Insurance, Fintech, Health, and Foundation; tailor scenarios to local contexts while enforcing Group standards.
- Lead or support technical recovery command during major incidents and planned DR events.
- Financial Planning & Value Optimization
- Quantify cost‑to‑recover vs. risk; recommend right‑sized patterns (active–active vs. warm/cold) by criticality.
- Support budgeting for resilience tooling, testing, and automation; demonstrate ROI through reduced downtime and faster recovery.
Key Deliverables
- Group DR Orchestration & Testing Policy, Standards, and Runbook Library.
- Annual DR Test Calendar with scenario catalog and success criteria.
- Service-level Recovery Certificates (per critical service) and remediation tracker.
- Enterprise Resilience Dashboard (RTO/RPO, coverage, pass rate, MTTR, confidence score).
- Quarterly Executive Resilience Reports and Board-ready summaries.
- Post-Exercise/Incident Review reports with prioritized corrective actions.
- Up-to-date Service Dependency Maps and MVS definitions.
Required Qualifications & Experience
Education
- Bachelor’s in Computer Science, Engineering, Information Systems, or related field.
- Master’s in IT Management, Business Continuity/Resilience, or Operations is an advantage.
- Certifications (Preferred)
Work Hours: 8
Experience in Months: 36
Level of Education: bachelor degree
Job application procedure
Interested and Qualified candidates can Click to apply
All Jobs | QUICK ALERT SUBSCRIPTION