Khoa Nguyen
Lead Data Engineer

Lead Data Engineer and former Technical Lead with 10+ years of experience building and owning end-to-end data platforms from zero to business-critical scale. I focus on scalable pipelines, trusted metrics, self-service analytics, and low-latency data systems.

Current Tech Stack

  • Python, SQL
  • Amazon Redshift, ClickHouse
  • dbt analytics engineering (metrics & semantic layer)
  • Apache Spark, AWS Glue, Athena
  • Data quality, validation, monitoring, cost optimization
  • System design, APIs, distributed systems, performance troubleshooting

About Me

Lead Data Engineer and former Technical Lead with 10+ years of experience designing, building, and owning end-to-end data platforms. I specialize in event-based data modeling, scalable ETL/ELT, and analytics engineering (dbt) to enable trusted metrics and self-service BI across Product, Marketing, and Finance. I also write technical articles on Medium, sharing practical guides and lessons from real-world systems.

10+
Years Experience
100B+
Events Processed
80%
Self-service Adoption

Featured Projects

Explore key data engineering projects demonstrating expertise in building scalable, reliable, and efficient data systems.

data infrastructure illustration

Company-wide Data Platform & Single Source of Truth

Role: Platform ownership, system design, cross-team alignment

Designed and owned a company-wide analytics and data platform that unified product, marketing, and finance data into a trusted foundation. Established standardized metrics and reporting workflows to accelerate decision-making across teams.

Key Achievements:

  • Enabled self-service analytics with ~80% of frequent users building and maintaining their own reports
  • Implemented dbt-based analytics engineering for consistent metrics and trusted reporting
  • Optimized platform cost and performance via query tuning, retention policies, and workload isolation
Amazon RedshiftClickHousedbtPythonSQLAWS
analytics dashboard visualization

High-volume Event Analytics for Email Marketing

Role: Scalability, reliability, event modeling

Built and maintained scalable pipelines processing 100B+ email and user events, supporting both near real-time and batch analytics. Powered data products and reporting used across the business.

Key Achievements:

  • Processed 100B+ events reliably at scale with strong consistency guarantees
  • Powered low-latency customer-facing analytics features with high availability
  • Improved marketing effectiveness by 30%+ via unified reporting and accurate attribution
Event ModelingStreaming + BatchSparkAWS GlueAthenaClickHouse
coding workspace concept

Smart Car — Government Traffic Optimization (Netherlands)

Role: Real-time ingestion, data processing, system integration

Contributed to a government-backed project aimed at reducing traffic congestion on the A58 highway. Built a system to ingest real-time vehicle telemetry via USB, analyze driving data, and deliver driver-facing recommendations on mobile devices.

Key Achievements:

  • Implemented real-time vehicle telemetry ingestion via USB for continuous data capture
  • Processed and analyzed driving patterns to identify traffic conditions and congestion signals
  • Delivered speed recommendations and driving instructions to help improve traffic flow
Real-time TelemetryMobile AppData IngestionStreamingNetherlands GovIoT
connected car dashboard visualization

Fleet Logic – Connected Car Platform (Netherlands & Germany)

Role: Architecture design, customer collaboration, distributed delivery

Multi-national connected-car platform (web + mobile) enabling drivers to control vehicles, pay parking fees, and ingest car data for analysis and recommendations across brands like Audi, Skoda, Volkswagen, and Seat.

Key Contributions & Impact:

  • Worked onsite in the Netherlands (~10 weeks) to clarify requirements directly with customers and accelerate decision-making
  • Designed architecture for parking requirements and ensured scalability across countries and partners
  • Delivered mobile features in a distributed team spanning Vietnam, Netherlands, Germany, and Spain
Xamarin Java Liferay OBU REST/SOAP
computer vision mobile concept

Check The Price – Real-time Price Tag Recognition (Startup)

Role: Product shaping, system design, technical leadership

A traveler-focused app that detects prices from real-world tags using image processing and converts them to the user’s local currency with up-to-date exchange rates.

Key Contributions & Impact:

  • Partnered with managers/investors to shape product requirements and propose practical, scalable solutions
  • Designed end-to-end architecture (services, database, solution skeleton) and set engineering standards as Tech Lead
  • Built core real-time detection logic and led reviews/mentorship to keep quality high in a fast-moving startup
Xamarin Firebase ML Kit Vue.js GraphQL .NET Core Microservices MySQL
smart home control illustration

Smart Home Control – Cross-platform IoT Control (Belgium)

Role: System architecture, integrations, mobile delivery

Cross-platform app for smart houses allowing end users to configure and schedule device settings without installer support, with device control via LAN or Internet.

Key Contributions & Impact:

  • Co-designed system architecture with a senior Netherlands-based architect, focusing on reliability and extensibility
  • Implemented core mobile features and integrations for device control and scheduling flows
  • Delivered a single solution running on macOS, Windows, Android, and iOS
Xamarin MQTT REST SOAP Cross-platform
route planning dashboard

Danabus – Smart City Bus Routing (Da Nang Government)

Role: Solution design, team leadership, APIs + mobile delivery

A government initiative to encourage public bus usage and reduce urban pollution by helping citizens find the shortest, fastest, and most convenient routes.

Key Contributions & Impact:

  • Owned solution design (architecture + skeleton) and aligned requirements with the government’s smart-city goals
  • Led the team and implemented route-finding logic to optimize travel options for end users
  • Delivered mobile app, APIs, and CMS with clear handover and team knowledge transfer
Xamarin ASP.NET Web API ASP.NET MVC SQL Server Routing Algorithm
real-time communication app concept

Amoha – On-demand Interpretation Platform (Vietnam)

Role: Requirements discovery, solution architecture, technical leadership

A marketplace-style platform connecting users with interpreters for real-time communication via mobile devices, including an admin CMS for operations.

Key Contributions & Impact:

  • Led requirement discovery with the customer and translated needs into an actionable architecture
  • Built the solution skeleton (data model, API boundaries) and guided implementation as Technical Lead
  • Developed real-time communication features and ensured code quality through mentoring and reviews
React Native WebRTC Socket.IO React Express MongoDB
carpooling platform illustration

Carpooling – Ride Sharing Platform with Payments (Vietnam)

Role: Project delivery, payments integration, engineering coaching

A web platform and admin CMS enabling ride-sharing between users, including booking management and fee handling.

Key Contributions & Impact:

  • Led project planning and delivery: estimates, work breakdown, task allocation, and tracking
  • Integrated NAPAS payment and established review practices to maintain stable releases
  • Supported algorithm work for route convenience and improved team effectiveness via coaching
React ASP.NET Web API MySQL NAPAS Payments
operations tracking dashboard

Bird Control – Airport Safety Operations (Netherlands)

Role: Product delivery, operational reliability, stakeholder alignment

A mobile + web system for airport admins and bird controllers to report tasks in real time, assign work, and track performance to reduce bird strike risk.

Key Contributions & Impact:

  • Delivered both website and mobile app components with attention to operational reliability
  • Implemented real-time reporting flows to keep admin oversight accurate and timely
  • Collaborated with stakeholders to ensure the product fit safety-critical workflows
Xamarin ASP.NET MVC SQL Server Real-time Reporting
robotics embedded app concept

React Robot – Embedded Android App for Educational Robots (US)

Role: Embedded development, Bluetooth/BLE, performance constraints

Embedded Android application inside children’s robots, communicating via Bluetooth and reacting to fiducial code cards for interactive behavior.

Key Contributions & Impact:

  • Ported fiducial code recognition library from C to Java for Android compatibility
  • Implemented robot movement/control logic based on recognized codes
  • Delivered stable BLE communication and interactive behaviors under embedded constraints
Android (Java) C BLE Computer Vision
device analytics concept

Universal Code – Device Error Logging & Analytics (Netherlands)

Role: Web delivery, analytics dashboards, maintainable system design

A web system to collect and visualize microscope device errors worldwide, helping the manufacturer monitor issues and improve product reliability.

Key Contributions & Impact:

  • Built web features for error ingestion and operational dashboards
  • Implemented reporting views for device statistics and trend tracking
  • Collaborated with customer teams to deliver a maintainable solution
ASP.NET MVC SQL Server Reporting
desktop analytics charts

Microscope Availability Reporter – Desktop Analytics (Netherlands)

Role: Desktop analytics app, Scrum facilitation, data visualization

Windows application to import years of microscope data and generate charts to analyze device availability and operational performance.

Key Contributions & Impact:

  • Served as Scrum Master to keep iteration cadence and delivery quality consistent
  • Built the Windows application to import historical datasets and render analytical charts
  • Worked with stakeholders to ensure the visuals answered practical operational questions
WPF C# Data Visualization Desktop App

Technical Skills

Comprehensive expertise across modern data platforms and software systems — from ingestion and processing to analytics, reliability, and delivery.

Data Technologies
Dagster 90%

Workflow orchestration and scheduling

Apache Airflow 90%

Workflow orchestration and scheduling

Apache Spark 80%

Big data processing and analytics

dbt 85%

Analytics engineering, metrics, and semantic layer

Athena 80%

Serverless SQL queries on data lake storage

AWS Glue 75%

Managed ETL jobs and data processing

Programming Languages
Python 90%

Primary language for data processing and automation

SQL 90%

Advanced queries, optimization, and warehouse modeling

Go 85%

High-performance services, data pipelines, and infrastructure tooling

C# 80%

Enterprise backend development, APIs, and long-running production systems

Java 70%

Connectors, services, and enterprise applications

Bash/Shell 85%

System administration and automation scripts

Databases
Amazon Redshift 90%

Warehouse modeling, performance tuning

ClickHouse 88%

Real-time analytics, aggregation workloads

PostgreSQL 85%

OLTP systems and advanced querying

MongoDB 80%

Document stores and aggregation pipelines

Redis 78%

Caching and real-time data structures

Cloud Platforms
AWS 90%

EC2, S3, RDS, Lambda, Glue, Athena

Docker 85%

Containerization and deployment workflows

DevOps & Monitoring
Git/GitHub 95%

Version control and collaboration

CI/CD 80%

GitHub Actions, GitLab CI, Jenkins

Grafana 75%

Dashboards and alerting

Prometheus 70%

Metrics collection and monitoring

Analytics & Visualization
Pandas/NumPy 85%

Data manipulation and analysis

Metabase 80%

Self-service BI, ad-hoc analysis, and business dashboards

Holistics 90%

Modern BI with modeling layer and governed metrics

Certifications

Microsoft Certified Xamarin Developer Microsoft Certified Xamarin Developer