Site Reliability Engineer Intern
2026-03-03T12:18:29+00:00
Interintel Technologies Limited
https://cdn.greatkenyanjobs.com/jsjobsdata/data/employer/comp_4949/logo/InterIntel%20Technologies%20Limited.png
https://www.interintel.org/
INTERN
Nairobi
Nairobi
00100
Kenya
Information Technology
Computer & IT, Science & Engineering
2026-03-09T17:00:00+00:00
8
About the Company
We are a team of passionate individuals who aspire to fuse the future and present I.T. Based challenges by offering cutting-edge software design and development, infrastructure, mobile-commerce solutions and go to market services to our clients. Our key strength lies in the ability to build innovative systems that can be easily integrated with client network...
Job Summary
The Site Reliability Engineer intern will support in applying software engineering principles to IT operations to ensure the company's platforms are reliable, scalable, observable, and efficient. Their role focuses on automation, monitoring, incident management, infrastructure as code, and measurable reliability targets (SLIS/SLOs) to guarantee high availability and performance across all products.
Duties and Responsibilities
- Assist in design, implement, and continuously improve system reliability, availability, and performance by assisting in defining and monitoring SLIS, SLOS, and error budgets across all assigned platforms.
- Support in building and managing a robust monitoring and observability framework using Prometheus, Grafana, and Loki to track latency, traffic, errors, system health, and user impact.
- Assist in automating infrastructure provisioning, scaling, and configuration management using Infrastructure as Code principles with Terraform and Kubernetes to ensure consistency, scalability, and disaster recovery readiness.
- Participate in incident response processes, including detection, escalation, resolution, communication, and conducting blameless postmortems to prevent recurrence.
- Assist in reduce manual operational workload through automation, scripting, and process optimization to improve efficiency and release velocity.
- Support in ensuring high availability and performance of business- critical systems.
- Collaborate with Engineering, Product, and DevOps teams to assist in improving deployment safety, capacity planning, cost optimization, and system scalability.
- Support in ensuring high availability and performance of business- critical systems.
- Assist in establishing alerting strategies and reliability standards that minimize alert fatigue while ensuring rapid detection and resolution of production issues.
Required Knowledge, Qualification and Experience
- Bachelor's Degree in Computer Science, Information Technology, or a related field.
- Some exposure in Kubernetes and Cloud networking.
- some experience with monitoring and observability tools.
- Good exposure managing production systems in cloud environments.
- Some exposure in implementing and managing CI/CD pipelines and utilizing tools like Jenkins, GitLab CI/CD, or equivalent.
- Some exposure with cloud platforms (AWS, Azure, Google Cloud) and containerization tools like Docker and Kubernetes.
- Basic hands-on exposure to monitoring and metrics systems such as Prometheus.
- Basic familiarity with dashboarding and visualization tools such as Grafana. Foundational understanding of log aggregation systems such as Loki.
- Familiarity with Linux environments and basic system commands. Exposure to scripting concepts using Python, Bash, or similar languages
- Foundational knowledge of Artificial Intelligence (AI) and good exposure with Al agents; relevant certifications in Al or related disciplines will be an added advantage.
- Assist in design, implement, and continuously improve system reliability, availability, and performance by assisting in defining and monitoring SLIS, SLOS, and error budgets across all assigned platforms.
- Support in building and managing a robust monitoring and observability framework using Prometheus, Grafana, and Loki to track latency, traffic, errors, system health, and user impact.
- Assist in automating infrastructure provisioning, scaling, and configuration management using Infrastructure as Code principles with Terraform and Kubernetes to ensure consistency, scalability, and disaster recovery readiness.
- Participate in incident response processes, including detection, escalation, resolution, communication, and conducting blameless postmortems to prevent recurrence.
- Assist in reduce manual operational workload through automation, scripting, and process optimization to improve efficiency and release velocity.
- Support in ensuring high availability and performance of business- critical systems.
- Collaborate with Engineering, Product, and DevOps teams to assist in improving deployment safety, capacity planning, cost optimization, and system scalability.
- Support in ensuring high availability and performance of business- critical systems.
- Assist in establishing alerting strategies and reliability standards that minimize alert fatigue while ensuring rapid detection and resolution of production issues.
- Kubernetes
- Cloud networking
- Monitoring and observability tools
- Production systems management in cloud environments
- CI/CD pipelines
- Terraform
- Docker
- Prometheus
- Grafana
- Loki
- Linux environments
- Python scripting
- Bash scripting
- Artificial Intelligence (AI)
- AI agents
- Bachelor's Degree in Computer Science, Information Technology, or a related field.
- Some exposure in Kubernetes and Cloud networking.
- Some experience with monitoring and observability tools.
- Good exposure managing production systems in cloud environments.
- Some exposure in implementing and managing CI/CD pipelines and utilizing tools like Jenkins, GitLab CI/CD, or equivalent.
- Some exposure with cloud platforms (AWS, Azure, Google Cloud) and containerization tools like Docker and Kubernetes.
- Basic hands-on exposure to monitoring and metrics systems such as Prometheus.
- Basic familiarity with dashboarding and visualization tools such as Grafana.
- Foundational understanding of log aggregation systems such as Loki.
- Familiarity with Linux environments and basic system commands.
- Exposure to scripting concepts using Python, Bash, or similar languages.
- Foundational knowledge of Artificial Intelligence (AI) and good exposure with AI agents.
- Relevant certifications in AI or related disciplines will be an added advantage.
No Requirements
JOB-69a6d195d7f4e
Vacancy title:
Site Reliability Engineer Intern
[Type: INTERN, Industry: Information Technology, Category: Computer & IT, Science & Engineering]
Jobs at:
Interintel Technologies Limited
Deadline of this Job:
Monday, March 9 2026
Duty Station:
Nairobi | Nairobi
Summary
Date Posted: Tuesday, March 3 2026, Base Salary: Not Disclosed
Similar Jobs in Kenya
Learn more about Interintel Technologies Limited
Interintel Technologies Limited jobs in Kenya
JOB DETAILS:
About the Company
We are a team of passionate individuals who aspire to fuse the future and present I.T. Based challenges by offering cutting-edge software design and development, infrastructure, mobile-commerce solutions and go to market services to our clients. Our key strength lies in the ability to build innovative systems that can be easily integrated with client network...
Job Summary
The Site Reliability Engineer intern will support in applying software engineering principles to IT operations to ensure the company's platforms are reliable, scalable, observable, and efficient. Their role focuses on automation, monitoring, incident management, infrastructure as code, and measurable reliability targets (SLIS/SLOs) to guarantee high availability and performance across all products.
Duties and Responsibilities
- Assist in design, implement, and continuously improve system reliability, availability, and performance by assisting in defining and monitoring SLIS, SLOS, and error budgets across all assigned platforms.
- Support in building and managing a robust monitoring and observability framework using Prometheus, Grafana, and Loki to track latency, traffic, errors, system health, and user impact.
- Assist in automating infrastructure provisioning, scaling, and configuration management using Infrastructure as Code principles with Terraform and Kubernetes to ensure consistency, scalability, and disaster recovery readiness.
- Participate in incident response processes, including detection, escalation, resolution, communication, and conducting blameless postmortems to prevent recurrence.
- Assist in reduce manual operational workload through automation, scripting, and process optimization to improve efficiency and release velocity.
- Support in ensuring high availability and performance of business- critical systems.
- Collaborate with Engineering, Product, and DevOps teams to assist in improving deployment safety, capacity planning, cost optimization, and system scalability.
- Support in ensuring high availability and performance of business- critical systems.
- Assist in establishing alerting strategies and reliability standards that minimize alert fatigue while ensuring rapid detection and resolution of production issues.
Required Knowledge, Qualification and Experience
- Bachelor's Degree in Computer Science, Information Technology, or a related field.
- Some exposure in Kubernetes and Cloud networking.
- some experience with monitoring and observability tools.
- Good exposure managing production systems in cloud environments.
- Some exposure in implementing and managing CI/CD pipelines and utilizing tools like Jenkins, GitLab CI/CD, or equivalent.
- Some exposure with cloud platforms (AWS, Azure, Google Cloud) and containerization tools like Docker and Kubernetes.
- Basic hands-on exposure to monitoring and metrics systems such as Prometheus.
- Basic familiarity with dashboarding and visualization tools such as Grafana. Foundational understanding of log aggregation systems such as Loki.
- Familiarity with Linux environments and basic system commands. Exposure to scripting concepts using Python, Bash, or similar languages
- Foundational knowledge of Artificial Intelligence (AI) and good exposure with Al agents; relevant certifications in Al or related disciplines will be an added advantage.
Work Hours: 8
Experience: No Requirements
Level of Education: bachelor degree
Job application procedure
Interested in applying for this job? Click here to submit your application now.
Send resume and portfolio with subject SITE RELIABITY ENGINEER INTERN
Submission deadline: 9th March 2026
All Jobs | QUICK ALERT SUBSCRIPTION