Experience: 0–3 years
Type: Full-time, On-site (Bengaluru)
About Mandrake Bioworks
Mandrake Bioworks is an AI-first biotechnology company reimagining how we design and engineer life itself. Our mission is to build intelligent systems that make biology programmable. We’re unlocking a new generation of gene editing technologies that will power breakthroughs in longevity, de-aging, sustainable agriculture, and climate resilience.
We’re building the foundational AI stack for biology - spanning large-scale biological data engines, foundation-model training, and generative design systems that can create new molecular tools from first principles.
If you want to build at the intersection of AI, life, and the future of civilization and see your models transform what humanity can engineer, this is the place to do it.
What You’ll Work On
- Build, train, and evaluate foundation and generative models for biological data
- Develop data pipelines for large-scale datasets: curation, normalization, clustering, and annotation
- Work hands-on with pretraining, finetuning, and reinforcement-learning (RL/RLHF/DPO) pipelines
- Implement and optimize transformer-based architectures (language + multimodal models)
- Design evaluation and benchmarking frameworks for model performance and representation learning
- Collaborate with the biology and ML teams to deploy AI systems that directly inform experimental design
You’re a Great Fit If You
- Are strong in Python and PyTorch/JAX, and comfortable with deep-learning libraries (Lightning, HuggingFace, etc.)
- Have experience with GenAI / LLMs / diffusion models, especially pretraining or finetuning
- Can handle large datasets end-to-end (data wrangling, deduplication, clustering, sampling)
- Have solid understanding of transformers, embeddings, attention, and training optimization
- Have hands on experience with RL / RLHF / DPO and can implement and iterate on them
- Thrive in fast-moving environments and like taking full ownership of what you build