Data Architect

Karachi / Islamabad / Lahore, Sindh / Punjab, Pakistan
Contracted
Technical Services
Experienced
KalSoft is looking for an experienced Data Architect with 10+ years of expertise in designing, developing, and managing on-premises Data Lake, Lakehouse, and Big Data platforms. The ideal candidate should have deep knowledge of Apache Spark, distributed computing, and modern data architectures, ensuring scalable, high-performance, and governed data environments.
Key Responsibilities:
1. Data Architecture & Strategy
  • Design and implement scalable, high-performance Data Lake/Lakehouse architectures to support enterprise analytics and AI workloads.
  • Define data partitioning, indexing, and storage strategies for efficient querying and processing.
  • Implement metadata management, data lineage, and data cataloging to ensure governance and compliance.
  • Establish data pipeline architectures that support batch, real-time, and streaming data processing.
2. Data Engineering & Processing
  • Architect and optimize large-scale data processing pipelines using Apache Spark (PySpark, Scala, or Java).
  • Implement distributed computing frameworks such as Spark on YARN, Kubernetes, or standalone clusters.
  • Lead the development of ETL/ELT pipelines using Apache Spark, Hadoop, Trino (Presto), Apache Iceberg, Delta Lake, or Apache Hudi.
  • Enable real-time data streaming using Apache Kafka, Spark Structured Streaming, Apache Flink, or Apache NiFi.
  • Ensure data lake interoperability with data warehouses, BI tools, and AI/ML platforms.
3. Performance Optimization & Scalability
  • Optimize Apache Spark jobs by implementing RDD tuning, partitioning strategies, and caching mechanisms.
  • Improve query performance using Apache Spark SQL, Delta Lake optimizations, and Z-Ordering.
  • Implement data lifecycle management, compaction, and auto-tuning techniques for large-scale datasets.
  • Ensure scalability, fault tolerance, and high availability of data platforms.
4. Security, Compliance & Governance
  • Implement data security policies, role-based access control (RBAC), encryption, and tokenization.
  • Ensure compliance with GDPR, HIPAA, or other industry regulatory frameworks.
  • Enforce audit logging, data masking, and identity management for enterprise data security.
  • Enable data versioning and time-travel capabilities in Lakehouse platforms for compliance and reproducibility.
5. Collaboration & Leadership
  • Work closely with Data Engineers, Data Scientists, DevOps, and Business Analysts to align on data needs.
  • Guide teams on modern data engineering best practices and Apache Spark optimizations.
  • Engage with stakeholders and leadership to define data architecture roadmaps.
Required Skills & Experience:
  • 10+ years of experience in data architecture, big data engineering, and data management.
  • Deep expertise in Apache Spark (PySpark, Scala, Java) for large-scale data processing.
  • Strong knowledge of on-premises Data Lake/Lakehouse architectures using Apache Iceberg, Delta Lake, or Apache Hudi.
  • Experience with Hadoop ecosystem (HDFS, YARN, Hive, Impala, HBase, Ozone).
  • Hands-on experience with distributed query engines (Trino, Presto, Apache Drill).
  • Experience with workflow orchestration tools (Apache Airflow, Oozie, Prefect).
  • Strong knowledge of data lake governance frameworks and metadata management.
  • Familiarity with containerization and orchestration (Docker, Kubernetes) for Spark-based workloads.
  • Experience with enterprise data security, access control, and data compliance regulations.
  • Programming skills in Python, Scala, Java, or SQL.
  • Experience in highly regulated industries (Oil & Gas, Healthcare, Telecom, Banking) is a plus.
Share

Apply for this position

Required*
We've received your resume. Click here to update it.
Attach resume as .pdf, .doc, .docx, .odt, .txt, or .rtf (limit 5MB) or Paste resume

Paste your resume here or Attach resume file

Human Check*