SRE Engineer
2025-09-19T21:46:35+00:00
Equity Bank
https://cdn.greatkenyanjobs.com/jsjobsdata/data/employer/comp_7833/logo/Equity%20Bank.png
https://equitygroupholdings.com/ke/
FULL_TIME
Nairobi
Nairobi
00100
Kenya
Banking
Science & Engineering
2025-10-03T17:00:00+00:00
Kenya
8
The Role Purpose
We are seeking a highly skilled and experienced ELK SRE Engineer to join our dynamic team. In this role, you will be responsible for the design, implementation, maintenance, and optimization of our Elasticsearch, Logstash, and Kibana (ELK) stack, ensuring its reliability, scalability, and performance. You will play a crucial part in providing robust logging, monitoring, and analytics solutions that are critical to our operational insights and incident response.
Responsibilities:
ELK Stack Management:
- Design, deploy, configure, and manage large-scale ELK clusters (Elasticsearch, Logstash, Kibana, Beats).
- Ensure the high availability, scalability, and disaster recovery of ELK environments.
- Monitor ELK cluster health, performance, and resource utilization, proactively identifying and resolving issues.
- Perform regular upgrades and patching of ELK components.
- Manage Elasticsearch indices, shards, mappings, and lifecycle policies.
- Optimize Elasticsearch query performance and indexing strategies.
- Troubleshoot complex issues related to data ingestion, search performance, and Kibana visualizations.
Site Reliability Engineering (SRE) Principles:
- Apply SRE principles to the ELK stack, focusing on automation, observability, and continuous improvement.
- Develop and implement monitoring and alerting solutions for the ELK infrastructure and data pipelines.
- Define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for ELK services.
- Conduct post-incident reviews to identify root causes and implement preventative measures.
Data Ingestion and Pipelines:
- Design, implement, and optimize data ingestion pipelines using Logstash, Beats (Filebeat, Metricbeat, Heartbeat, etc.), Kafka, or other relevant technologies.
- Develop custom Logstash filters and configurations to parse, enrich, and transform log data.
- Ensure data quality, integrity, and security throughout the ingestion process.
Collaboration & Mentorship:
- Work closely with development, operations, and security teams to understand their logging and monitoring requirements.
- Provide expertise and guidance on best practices for using the ELK stack.
- Create documentation, runbooks, and training materials for ELK users and administrators.
- Mentor junior engineers and contribute to a culture of knowledge sharing.
Automation:
- Automate ELK deployment, configuration, and operational tasks using tools like Ansible, Terraform, Puppet, or Chef.
- Develop scripts (Python, Go, Bash) to streamline common ELK administration tasks.
Qualifications
Required Qualifications:
- Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent practical experience.
- 5+ years of experience working with and managing large-scale ELK (Elasticsearch, Logstash, Kibana) deployments.
- Strong understanding of Elasticsearch architecture, performance tuning, and scaling strategies.
- Proficiency in configuring Logstash pipelines and Beats for various data sources.
- Experience with Kibana for dashboard creation, visualization, and alerting.
- Solid experience with Linux/Unix operating systems.
- Experience with Azure cloud platforms
- Familiarity with SRE principles and practices, including SLOs, SLIs, and error budgets.
- Strong problem-solving skills and the ability to troubleshoot complex distributed systems.
- Excellent communication and collaboration skills.
Preferred Qualifications:
- Experience with Kafka or other message queuing systems.
- Knowledge of other monitoring tools (Dynatrace, Datadog).
- Familiarity with security best practices for the ELK stack.
- Certifications related to Elasticsearch or cloud platforms.
ELK Stack Management: Design, deploy, configure, and manage large-scale ELK clusters (Elasticsearch, Logstash, Kibana, Beats). Ensure the high availability, scalability, and disaster recovery of ELK environments. Monitor ELK cluster health, performance, and resource utilization, proactively identifying and resolving issues. Perform regular upgrades and patching of ELK components. Manage Elasticsearch indices, shards, mappings, and lifecycle policies. Optimize Elasticsearch query performance and indexing strategies. Troubleshoot complex issues related to data ingestion, search performance, and Kibana visualizations. Site Reliability Engineering (SRE) Principles: Apply SRE principles to the ELK stack, focusing on automation, observability, and continuous improvement. Develop and implement monitoring and alerting solutions for the ELK infrastructure and data pipelines. Define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for ELK services. Conduct post-incident reviews to identify root causes and implement preventative measures. Data Ingestion and Pipelines: Design, implement, and optimize data ingestion pipelines using Logstash, Beats (Filebeat, Metricbeat, Heartbeat, etc.), Kafka, or other relevant technologies. Develop custom Logstash filters and configurations to parse, enrich, and transform log data. Ensure data quality, integrity, and security throughout the ingestion process. Collaboration & Mentorship: Work closely with development, operations, and security teams to understand their logging and monitoring requirements. Provide expertise and guidance on best practices for using the ELK stack. Create documentation, runbooks, and training materials for ELK users and administrators. Mentor junior engineers and contribute to a culture of knowledge sharing. Automation: Automate ELK deployment, configuration, and operational tasks using tools like Ansible, Terraform, Puppet, or Chef. Develop scripts (Python, Go, Bash) to streamline common ELK administration tasks.
Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent practical experience. 5+ years of experience working with and managing large-scale ELK (Elasticsearch, Logstash, Kibana) deployments. Strong understanding of Elasticsearch architecture, performance tuning, and scaling strategies. Proficiency in configuring Logstash pipelines and Beats for various data sources. Experience with Kibana for dashboard creation, visualization, and alerting. Solid experience with Linux/Unix operating systems. Experience with Azure cloud platforms Familiarity with SRE principles and practices, including SLOs, SLIs, and error budgets. Strong problem-solving skills and the ability to troubleshoot complex distributed systems. Excellent communication and collaboration skills.
JOB-68cdcf3b8ae9d
Vacancy title:
SRE Engineer
[Type: FULL_TIME, Industry: Banking, Category: Science & Engineering]
Jobs at:
Equity Bank
Deadline of this Job:
Friday, October 3 2025
Duty Station:
Nairobi | Nairobi | Kenya
Summary
Date Posted: Friday, September 19 2025, Base Salary: Not Disclosed
Similar Jobs in Kenya
Learn more about Equity Bank
Equity Bank jobs in Kenya
JOB DETAILS:
The Role Purpose
We are seeking a highly skilled and experienced ELK SRE Engineer to join our dynamic team. In this role, you will be responsible for the design, implementation, maintenance, and optimization of our Elasticsearch, Logstash, and Kibana (ELK) stack, ensuring its reliability, scalability, and performance. You will play a crucial part in providing robust logging, monitoring, and analytics solutions that are critical to our operational insights and incident response.
Responsibilities:
ELK Stack Management:
- Design, deploy, configure, and manage large-scale ELK clusters (Elasticsearch, Logstash, Kibana, Beats).
- Ensure the high availability, scalability, and disaster recovery of ELK environments.
- Monitor ELK cluster health, performance, and resource utilization, proactively identifying and resolving issues.
- Perform regular upgrades and patching of ELK components.
- Manage Elasticsearch indices, shards, mappings, and lifecycle policies.
- Optimize Elasticsearch query performance and indexing strategies.
- Troubleshoot complex issues related to data ingestion, search performance, and Kibana visualizations.
Site Reliability Engineering (SRE) Principles:
- Apply SRE principles to the ELK stack, focusing on automation, observability, and continuous improvement.
- Develop and implement monitoring and alerting solutions for the ELK infrastructure and data pipelines.
- Define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for ELK services.
- Conduct post-incident reviews to identify root causes and implement preventative measures.
Data Ingestion and Pipelines:
- Design, implement, and optimize data ingestion pipelines using Logstash, Beats (Filebeat, Metricbeat, Heartbeat, etc.), Kafka, or other relevant technologies.
- Develop custom Logstash filters and configurations to parse, enrich, and transform log data.
- Ensure data quality, integrity, and security throughout the ingestion process.
Collaboration & Mentorship:
- Work closely with development, operations, and security teams to understand their logging and monitoring requirements.
- Provide expertise and guidance on best practices for using the ELK stack.
- Create documentation, runbooks, and training materials for ELK users and administrators.
- Mentor junior engineers and contribute to a culture of knowledge sharing.
Automation:
- Automate ELK deployment, configuration, and operational tasks using tools like Ansible, Terraform, Puppet, or Chef.
- Develop scripts (Python, Go, Bash) to streamline common ELK administration tasks.
Qualifications
Required Qualifications:
- Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent practical experience.
- 5+ years of experience working with and managing large-scale ELK (Elasticsearch, Logstash, Kibana) deployments.
- Strong understanding of Elasticsearch architecture, performance tuning, and scaling strategies.
- Proficiency in configuring Logstash pipelines and Beats for various data sources.
- Experience with Kibana for dashboard creation, visualization, and alerting.
- Solid experience with Linux/Unix operating systems.
- Experience with Azure cloud platforms
- Familiarity with SRE principles and practices, including SLOs, SLIs, and error budgets.
- Strong problem-solving skills and the ability to troubleshoot complex distributed systems.
- Excellent communication and collaboration skills.
Preferred Qualifications:
- Experience with Kafka or other message queuing systems.
- Knowledge of other monitoring tools (Dynatrace, Datadog).
- Familiarity with security best practices for the ELK stack.
- Certifications related to Elasticsearch or cloud platforms.
Work Hours: 8
Experience in Months: 60
Level of Education: bachelor degree
Job application procedure
Interested and qualified? Click here to apply
All Jobs | QUICK ALERT SUBSCRIPTION