About Us
We're Keranos Tech — building something ambitious: a blazing-fast, on-premise AI system powered by Retrieval-Augmented Generation (RAG). Our mission is to make AI smarter at finding and understanding knowledge hidden inside documents — with maximum accuracy, privacy, and speed.
We're early-stage, which means every team member will have a huge impact on the tech, the product, and the company's direction.
🤖 Your AI Mission
Own the AI pipeline from document ingestion to embedding generation, vector search, and LLM response. You'll be at the forefront of building enterprise-grade AI systems that actually work in production.
What You'll Do
- Own the AI pipeline: from document ingestion to embedding generation, vector search, and LLM response
- Evaluate and implement frameworks (LangChain, LlamaIndex, Haystack) and vector DBs (FAISS, Milvus, pgvector, etc.)
- Deploy, optimize, and fine-tune LLMs (Llama, Mistral, Falcon, etc.) in on-prem/GPU environments — including quantization and performance tricks
- Research and prototype: fine-tune or train models where off-the-shelf solutions don't cut it
- Benchmark frameworks and models for latency, accuracy, and throughput
- Work closely with backend, DevOps, and product teams to deliver robust end-to-end AI features
What We're Looking For
- Strong Python skills with experience in ML frameworks (PyTorch, TensorFlow)
- Hands-on experience with RAG pipelines and vector DBs (FAISS, Milvus, Pinecone, pgvector)
- Confident with LLM deployment (GPU management, quantization, acceleration)
- Knowledge of embeddings, semantic search, information retrieval, and re-ranking
- Comfort with Linux environments, CUDA, and GPU debugging
- Startup mindset: proactive, fast, scrappy, and able to ship prototypes quickly
Bonus Points
- Fine-tuning LLMs or embeddings for domain-specific tasks
- Familiarity with hybrid search (BM25 + embeddings) or knowledge graphs
- MLOps skills: monitoring, versioning, deployment
- Experience with FastAPI/gRPC for serving models
- Prior work on enterprise-grade semantic search or privacy-first AI
Python
PyTorch
TensorFlow
RAG
LLM
Vector DBs
CUDA
FastAPI
Why Join Us?
- Massive Ownership: Shape the technical direction of our AI stack from day one
- Cutting-Edge Work: Push the limits of LLMs, RAG, and on-prem performance
- Immediate Impact: Fast-moving team, minimal bureaucracy, rapid iteration
- Career Growth: Influence tech culture and future hires
- Bangkok-based team — competitive salary + potential equity opportunities
- Access to high-end GPU infrastructure for experimentation
- Conference budget and learning opportunities