Data Engineer
Mô tả công việc
- Build and maintain large-scale data platforms processing tens to hundreds of millions of events per day.
- Design and operate real-time and batch pipelines handle 100k+ events/second (including image, audio, video streams) using Kafka, Spark Streaming, and/or Flink .
- Develop and scale multimodal pipelines (image/object detection, OCR, face recognition, audio transcription, video processing).
- Work with vector databases (Milvus, Weaviate, or similar) to store and query millions of embeddings with metadata filtering and hybrid search.
- Integrate LLMs and multimodal models (GPT-4, Llama-3, Claude, etc.) into data pipelines for entity extraction, classification, enrichment, and summarization.
- Participate in solving Entity Resolution challenges: deduplicate and merge entities from multiple news/sources using fuzzy matching, simple graph techniques, and contextual signals.
- Build and maintain a modern Lakehouse using Apache Iceberg or Delta Lake on S3/MinIO with schema evolution and time-travel capabilities.
- Ensure good data quality, observability, and monitoring (lineage, basic data quality checks, dashboards, alerting).
- Optimize cost and performance of Spark/Flink jobs running on Kubernetes (autoscaling, resource management, basic spot-instance usage).
- Collaborate closely with AI Engineers and Bussiness Analysts to translate business needs into reliable data pipelines.
THE CHALLENGES YOU WILL LOVE
- Keeping real-time pipelines stable and accurate under high throughput (10k–100k+ events/sec).
- Handling noisy, conflicting, and rapidly changing data from many sources with dozens of entities, thousands of attributes, and hundreds of relationships.
- Achieving low-latency enrichment with LLMs and vector search in streaming workflows.
- Maintaining vector indexes with millions of new embeddings daily.
- Building image/video analytics that work reliably on real-world media.
- Performing schema migrations and backfills on large datasets with minimal downtime.
- Ensuring good observability so the team can quickly spot and fix issues
Yêu cầu ứng viên
- 3+ years of hands-on Data Engineering experience.
- Bachelor’s degree in Computer Science, Engineering, Mathematics, or equivalent practical experience.
- Strong hands-on experience with at least 3 of the following: Spark (Structured Streaming or DataFrame API), Kafka, Flink, Airflow, Iceberg/Delta Lake, vector DBs (Milvus, Weaviate, Pinecone, Qdrant…).
- Proven experience shipping production data systems that process tens of millions+ events/day.
- Practical experience with any form of Entity Resolution / deduplication / record linkage (even at moderate scale).
- Hands-on work with image or multimodal pipelines (OCR, object detection, face recognition, transcription) is a plus.
- Experience calling LLMs or embedding models from data pipelines (LangChain, LlamaIndex, direct API, etc.).
- Good knowledge of vector search concepts and at least one vector database.
- Proficiency in Python (primary) and SQL; Java/Scala is a plus.
- Solid experience with Docker + Kubernetes in production (Helm, basic manifests, or ArgoCD is enough).
- Strong focus on testing, data quality, monitoring, and automation.
- You enjoy solving messy real-world data problems and making systems reliable and maintainable.
Quyền lợi/Phúc lợi
- Salary that truly reflects your abilities and is competitive in the market
- Additional allowances outside of salary: Breakfast and lunch provided at the company
- Performance evaluated quarterly/annually with opportunities for job grade promotion; work and collaborate with a high-quality team
- Provided with modern work equipment (MacBook, laptop, 24” LCD monitor, etc.)
- 5-day work week (Saturday and Sunday off)
- Opportunities to attend professional training courses, enhance job-related skills and soft skills, and obtain IT certifications
- Young, dynamic, and energetic working environment that encourages employees to maximize their potential, with many career advancement opportunities
- Modern office with open-space design, located in a building exclusively occupied by the company
- Annual leave and social insurance, health insurance, and unemployment insurance in full compliance with Vietnamese labor law
- Comprehensive private health insurance for employees and their family members
- Full support for business trip expenses
- Participation in internal cultural activities, team building, and annual company trip
- Wedding gifts, holiday/Tết bonuses and gifts, and other welfare benefits
Thời gian làm việc
- Thứ 2 - Thứ 6 (từ 09:00 đến 18:00)
Địa điểm làm việc
- Phường Phú Nhuận, Thành phố Hồ Chí Minh
| Quy mô: | 50-99 nhân viên |
| Lĩnh vực: | Công nghệ thông tin, Nhóm nghề khác |
| Địa chỉ: | Số 2 Trương Quốc Dung, Phường Phú Nhuận, TP Hồ Chí Minh |
| Tên công ty: | Công ty TNHH Athena AI |
| Quy mô: | 50-99 nhân viên |
| Lĩnh vực: | Công nghệ thông tin, Nhóm nghề khác |
| Địa chỉ: | Số 2 Trương Quốc Dung, Phường Phú Nhuận, TP Hồ Chí Minh |
| Ngày đăng tuyển: | 03/12/2025 |
| Cấp bậc: | Nhân viên |
| Học vấn: | Cao đẳng |
| Số lượng tuyển: | 1 |
| Độ tuổi: | Không yêu cầu |
| Giới tính: | Không yêu cầu |
| Hình thức làm việc: | Toàn thời gian |
Thông báo
Bạn chưa thể ứng tuyển, Vui lòng Đăng nhập nộp hồ sơ
Nếu chưa có tài khoản, hãy Đăng ký tài khoản với chúng tôi
Đăng xuất
Việc làm Hồ Chí Minh