Machine Learning Engineer

RAG / LLM / On-Prem AI

Bangkok, Thailand

Full-time

Onsite

AI/ML

About Us

We're Keranos Tech — building something ambitious: a blazing-fast, on-premise AI system powered by Retrieval-Augmented Generation (RAG). Our mission is to make AI smarter at finding and understanding knowledge hidden inside documents — with maximum accuracy, privacy, and speed.

We're early-stage, which means every team member will have a huge impact on the tech, the product, and the company's direction.

🤖 Your AI Mission

Own the AI pipeline from document ingestion to embedding generation, vector search, and LLM response. You'll be at the forefront of building enterprise-grade AI systems that actually work in production.

What You'll Do

Own the AI pipeline: from document ingestion to embedding generation, vector search, and LLM response
Evaluate and implement frameworks (LangChain, LlamaIndex, Haystack) and vector DBs (FAISS, Milvus, pgvector, etc.)
Deploy, optimize, and fine-tune LLMs (Llama, Mistral, Falcon, etc.) in on-prem/GPU environments — including quantization and performance tricks
Research and prototype: fine-tune or train models where off-the-shelf solutions don't cut it
Benchmark frameworks and models for latency, accuracy, and throughput
Work closely with backend, DevOps, and product teams to deliver robust end-to-end AI features

What We're Looking For

Strong Python skills with experience in ML frameworks (PyTorch, TensorFlow)
Hands-on experience with RAG pipelines and vector DBs (FAISS, Milvus, Pinecone, pgvector)
Confident with LLM deployment (GPU management, quantization, acceleration)
Knowledge of embeddings, semantic search, information retrieval, and re-ranking
Comfort with Linux environments, CUDA, and GPU debugging
Startup mindset: proactive, fast, scrappy, and able to ship prototypes quickly

Bonus Points

Fine-tuning LLMs or embeddings for domain-specific tasks
Familiarity with hybrid search (BM25 + embeddings) or knowledge graphs
MLOps skills: monitoring, versioning, deployment
Experience with FastAPI/gRPC for serving models
Prior work on enterprise-grade semantic search or privacy-first AI

Python PyTorch TensorFlow RAG LLM Vector DBs CUDA FastAPI

Why Join Us?

Massive Ownership: Shape the technical direction of our AI stack from day one
Cutting-Edge Work: Push the limits of LLMs, RAG, and on-prem performance
Immediate Impact: Fast-moving team, minimal bureaucracy, rapid iteration
Career Growth: Influence tech culture and future hires
Bangkok-based team — competitive salary + potential equity opportunities
Access to high-end GPU infrastructure for experimentation
Conference budget and learning opportunities

Back to Careers Apply Now