Big Data Platform Operations Engineer
2025-06-20T10:02:23+00:00
Safaricom Kenya
https://cdn.greatkenyanjobs.com/jsjobsdata/data/employer/comp_8023/logo/safaricom.png
https://www.safaricom.co.ke/
FULL_TIME
Nairobi
kenya
00100
Kenya
Telecommunications
Computer & IT
2025-06-27T17:00:00+00:00
Kenya
8
Key Responsibilities
- Platform Architecture: Design and implement scalable and fault-tolerant big data platforms using distributed computing technologies such as Apache Hadoop, Apache Spark, or Apache Flink. Architect data storage solutions, including distributed file systems, NoSQL databases, and data warehouses.
- Data Ingestion and Integration: Develop data ingestion pipelines to collect, process, and ingest data from various sources, including streaming data sources, databases, APIs, and log files. Implement ETL (Extract, Transform, Load) processes to preprocess and cleanse raw data for analysis.
- Data Processing and Analysis: Optimize data processing workflows for efficiency, performance, and scalability. Develop and maintain data processing jobs, queries, and analytics workflows using distributed computing frameworks and query languages such as SQL, Hive, or Spark SQL.
- Scalability and Performance: Design and implement strategies for scaling big data platforms to handle large volumes of data and diverse workloads. Optimize resource utilization, data partitioning, and parallel processing techniques to maximize performance and minimize latency.
- Monitoring and Optimization: Develop monitoring and alerting solutions to track the health, performance, and availability of big data platforms. Implement automated scaling, load balancing, and resource management mechanisms to optimize platform utilization and performance.
- Security and Governance: Ensure compliance with data governance policies, security requirements, and regulatory standards. Implement access controls, encryption, and auditing mechanisms to protect sensitive data and ensure data privacy and confidentiality.
- Infrastructure as Code (IaC): Implement infrastructure automation using tools such as Terraform, Ansible, or CloudFormation. Define infrastructure configurations, provisioning scripts, and deployment pipelines to enable reproducible and consistent deployments of big data platforms.
- Documentation and Training: Document platform architecture, configurations, and best practices. Provide training and support to data engineers and data scientists to ensure effective use of big data platforms and tools.
Qualifications
- Bachelor’s or master’s degree in computer science, Engineering, or related field.
- Solid understanding of big data technologies, distributed systems, and cloud computing principles.
- Proficiency in programming languages such as Python, Java, or Scala.
- Experience with big data frameworks such as Apache Hadoop, Apache Spark, or Apache Flink.
- Familiarity with cloud platforms such as AWS, GCP, or Azure.
- System administration skills.
- Strong problem-solving skills and attention to detail.
- Excellent communication and collaboration skills.
- Ability to work independently and manage multiple priorities in a fast-paced environment.
Platform Architecture: Design and implement scalable and fault-tolerant big data platforms using distributed computing technologies such as Apache Hadoop, Apache Spark, or Apache Flink. Architect data storage solutions, including distributed file systems, NoSQL databases, and data warehouses. Data Ingestion and Integration: Develop data ingestion pipelines to collect, process, and ingest data from various sources, including streaming data sources, databases, APIs, and log files. Implement ETL (Extract, Transform, Load) processes to preprocess and cleanse raw data for analysis. Data Processing and Analysis: Optimize data processing workflows for efficiency, performance, and scalability. Develop and maintain data processing jobs, queries, and analytics workflows using distributed computing frameworks and query languages such as SQL, Hive, or Spark SQL. Scalability and Performance: Design and implement strategies for scaling big data platforms to handle large volumes of data and diverse workloads. Optimize resource utilization, data partitioning, and parallel processing techniques to maximize performance and minimize latency. Monitoring and Optimization: Develop monitoring and alerting solutions to track the health, performance, and availability of big data platforms. Implement automated scaling, load balancing, and resource management mechanisms to optimize platform utilization and performance. Security and Governance: Ensure compliance with data governance policies, security requirements, and regulatory standards. Implement access controls, encryption, and auditing mechanisms to protect sensitive data and ensure data privacy and confidentiality. Infrastructure as Code (IaC): Implement infrastructure automation using tools such as Terraform, Ansible, or CloudFormation. Define infrastructure configurations, provisioning scripts, and deployment pipelines to enable reproducible and consistent deployments of big data platforms. Documentation and Training: Document platform architecture, configurations, and best practices. Provide training and support to data engineers and data scientists to ensure effective use of big data platforms and tools.
Bachelor’s or master’s degree in computer science, Engineering, or related field. Solid understanding of big data technologies, distributed systems, and cloud computing principles. Proficiency in programming languages such as Python, Java, or Scala. Experience with big data frameworks such as Apache Hadoop, Apache Spark, or Apache Flink. Familiarity with cloud platforms such as AWS, GCP, or Azure. System administration skills. Strong problem-solving skills and attention to detail. Excellent communication and collaboration skills. Ability to work independently and manage multiple priorities in a fast-paced environment.
No Requirements
JOB-685531af293fb