Machine Learning Engineer

RAG / LLM / On-Prem AI

Bangkok, Thailand
Full-time
Onsite
AI/ML

About Us

We're Keranos Tech — building something ambitious: a blazing-fast, on-premise AI system powered by Retrieval-Augmented Generation (RAG). Our mission is to make AI smarter at finding and understanding knowledge hidden inside documents — with maximum accuracy, privacy, and speed.

We're early-stage, which means every team member will have a huge impact on the tech, the product, and the company's direction.

🤖 Your AI Mission

Own the AI pipeline from document ingestion to embedding generation, vector search, and LLM response. You'll be at the forefront of building enterprise-grade AI systems that actually work in production.

What You'll Do

  • Own the AI pipeline: from document ingestion to embedding generation, vector search, and LLM response
  • Evaluate and implement frameworks (LangChain, LlamaIndex, Haystack) and vector DBs (FAISS, Milvus, pgvector, etc.)
  • Deploy, optimize, and fine-tune LLMs (Llama, Mistral, Falcon, etc.) in on-prem/GPU environments — including quantization and performance tricks
  • Research and prototype: fine-tune or train models where off-the-shelf solutions don't cut it
  • Benchmark frameworks and models for latency, accuracy, and throughput
  • Work closely with backend, DevOps, and product teams to deliver robust end-to-end AI features

What We're Looking For

  • Strong Python skills with experience in ML frameworks (PyTorch, TensorFlow)
  • Hands-on experience with RAG pipelines and vector DBs (FAISS, Milvus, Pinecone, pgvector)
  • Confident with LLM deployment (GPU management, quantization, acceleration)
  • Knowledge of embeddings, semantic search, information retrieval, and re-ranking
  • Comfort with Linux environments, CUDA, and GPU debugging
  • Startup mindset: proactive, fast, scrappy, and able to ship prototypes quickly

Bonus Points

  • Fine-tuning LLMs or embeddings for domain-specific tasks
  • Familiarity with hybrid search (BM25 + embeddings) or knowledge graphs
  • MLOps skills: monitoring, versioning, deployment
  • Experience with FastAPI/gRPC for serving models
  • Prior work on enterprise-grade semantic search or privacy-first AI
Python PyTorch TensorFlow RAG LLM Vector DBs CUDA FastAPI

Why Join Us?

  • Massive Ownership: Shape the technical direction of our AI stack from day one
  • Cutting-Edge Work: Push the limits of LLMs, RAG, and on-prem performance
  • Immediate Impact: Fast-moving team, minimal bureaucracy, rapid iteration
  • Career Growth: Influence tech culture and future hires
  • Bangkok-based team — competitive salary + potential equity opportunities
  • Access to high-end GPU infrastructure for experimentation
  • Conference budget and learning opportunities