Data Architect
Karachi / Islamabad / Lahore, Sindh / Punjab, Pakistan
Contracted
Technical Services
Experienced
KalSoft is looking for an experienced Data Architect with 10+ years of expertise in designing, developing, and managing on-premises Data Lake, Lakehouse, and Big Data platforms. The ideal candidate should have deep knowledge of Apache Spark, distributed computing, and modern data architectures, ensuring scalable, high-performance, and governed data environments.
Key Responsibilities:
1. Data Architecture & Strategy
Key Responsibilities:
1. Data Architecture & Strategy
- Design and implement scalable, high-performance Data Lake/Lakehouse architectures to support enterprise analytics and AI workloads.
- Define data partitioning, indexing, and storage strategies for efficient querying and processing.
- Implement metadata management, data lineage, and data cataloging to ensure governance and compliance.
- Establish data pipeline architectures that support batch, real-time, and streaming data processing.
- Architect and optimize large-scale data processing pipelines using Apache Spark (PySpark, Scala, or Java).
- Implement distributed computing frameworks such as Spark on YARN, Kubernetes, or standalone clusters.
- Lead the development of ETL/ELT pipelines using Apache Spark, Hadoop, Trino (Presto), Apache Iceberg, Delta Lake, or Apache Hudi.
- Enable real-time data streaming using Apache Kafka, Spark Structured Streaming, Apache Flink, or Apache NiFi.
- Ensure data lake interoperability with data warehouses, BI tools, and AI/ML platforms.
- Optimize Apache Spark jobs by implementing RDD tuning, partitioning strategies, and caching mechanisms.
- Improve query performance using Apache Spark SQL, Delta Lake optimizations, and Z-Ordering.
- Implement data lifecycle management, compaction, and auto-tuning techniques for large-scale datasets.
- Ensure scalability, fault tolerance, and high availability of data platforms.
- Implement data security policies, role-based access control (RBAC), encryption, and tokenization.
- Ensure compliance with GDPR, HIPAA, or other industry regulatory frameworks.
- Enforce audit logging, data masking, and identity management for enterprise data security.
- Enable data versioning and time-travel capabilities in Lakehouse platforms for compliance and reproducibility.
- Work closely with Data Engineers, Data Scientists, DevOps, and Business Analysts to align on data needs.
- Guide teams on modern data engineering best practices and Apache Spark optimizations.
- Engage with stakeholders and leadership to define data architecture roadmaps.
- 10+ years of experience in data architecture, big data engineering, and data management.
- Deep expertise in Apache Spark (PySpark, Scala, Java) for large-scale data processing.
- Strong knowledge of on-premises Data Lake/Lakehouse architectures using Apache Iceberg, Delta Lake, or Apache Hudi.
- Experience with Hadoop ecosystem (HDFS, YARN, Hive, Impala, HBase, Ozone).
- Hands-on experience with distributed query engines (Trino, Presto, Apache Drill).
- Experience with workflow orchestration tools (Apache Airflow, Oozie, Prefect).
- Strong knowledge of data lake governance frameworks and metadata management.
- Familiarity with containerization and orchestration (Docker, Kubernetes) for Spark-based workloads.
- Experience with enterprise data security, access control, and data compliance regulations.
- Programming skills in Python, Scala, Java, or SQL.
- Experience in highly regulated industries (Oil & Gas, Healthcare, Telecom, Banking) is a plus.
Apply for this position
Required*