Sanjai Murugan
sanjai_murugan
AI Systems Engineer
Initializing neural pathways...
⌨
🏠 Home ⌘+1
πŸ‘€ About Me ⌘+2
πŸš€ Engineering Projects ⌘+3
πŸ”¬ Research ⌘+4
πŸ’‘ Startup Concept ⌘+5
πŸ›€οΈ My Journey ⌘+6
βš™οΈ Engineering Toolkit ⌘+7
πŸ“ Technical Insights ⌘+8
πŸ“œ Credentials ⌘+9
πŸ“‘ Get in Touch ⌘+0
Sanjai Murugan
Sanjai Murugan
Currently exploring new opportunities

Engineering AI Systems

I'm Sanjai β€” an AI systems engineer and researcher building local-first AI architectures, multi-agent orchestration, RAG pipelines,AI Automation and efficient inference systems.

|
LoRA Fine-Tuning
RAG Systems
Local Inference
Agentic Workflows
Neural Networks
Machine Learning
AI Automation
Computer Vision
Multi-Agent Arch.
sancodes@local-ai: ~
❯ cat about_me.md
β€” AI systems engineer from India
β€” Building production-grade local LLM systems
β€” Researching efficient inference & agent architectures
❯ ls projects/
β€” multi-agent-coding-system
β€” rag-portfolio-assistant
β€” vision-lowlight-detection
❯ echo "Ready to build"
β–ˆ deployment ready_
0
Years Building
0
AI Projects
0
Certifications

Scroll to explore

Building AI at the Systems Level

Sanjai Murugan
Name Sanjai Murugan
Role AI Systems Engineer
Focus Local LLMs & Agentic Systems
Location India

I'm an AI systems engineer with a passion for building efficient, scalable AI systems that work locally and privately. My expertise spans from transformer internals and attention mechanisms to deployment pipelines and inference optimization.

I specialize in local-first AI architectures, multi-agent orchestration,Agentic AI, and RAG systems. I don't just use AI toolsβ€”I build the infrastructure that makes them work efficiently on consumer hardware.

Currently exploring sparse expert routing, KV cache optimization, and edge deployment strategies for large language models.

AI Systems Local LLMs Multi-Agent RAG Lora/QLora Apple Silicon Python TensorFlow PyTorch

Systems I've Built

Each project reflects deep systems-level thinking β€” spanning agent orchestration, inference optimization, retrieval systems, and computer vision.

πŸ” Research Project

Bike Rental RAG Assistant

"AI-powered retrieval assistant for intelligent bike rental support and real-time customer interaction."

A Retrieval-Augmented Generation (RAG) based AI assistant developed for a bike rental platform to handle real-time customer queries and provide accurate responses regarding pricing, rental policies, bike availability, booking procedures, and platform features. The system leverages semantic search and vector embeddings to retrieve relevant information from a knowledge base, enabling context-aware and reliable conversations.

  • Real-time query handling for bike rental platform users
  • Semantic document retrieval using vector databases and embeddings
  • Context-aware multi-turn conversational support
  • Efficient knowledge retrieval with optimized chunking techniques
  • Accurate response generation with retrieval-based context injection
LangChainChromaDBOpenAI EmbeddingsFastAPI
πŸ§ͺ Experiment

TinyOllama LoRA Fine-Tuning Research

"Parameter-efficient specialization experiments on lightweight local LLMs."

Explored LoRA-based fine-tuning workflows using MLX and lightweight language models to understand efficient specialization techniques and local deployment optimization. Documented rank exploration, quantization strategies, and quality trade-offs.

  • Low-parameter adaptation with LoRA rank exploration (r=4 to r=32)
  • Inference efficiency benchmarking before and after fine-tuning
  • Specialization behavior analysis across model sizes
  • Local deployment feasibility studies on consumer hardware
~0.1% of total parameters trained
4Γ— reduced compute vs full FT
Optimized experimentation pipeline

Key Findings

  • LoRA rank r=8 achieved 85% of full fine-tuning quality with 97% fewer trainable parameters.
  • MLX on Apple Silicon proved competitive with CUDA for small-scale LoRA experiments.
  • Quantization-aware fine-tuning (QLoRA) enabled effective training on 6GB VRAM hardware.
πŸ“· Applied AI / Computer Vision

Low-Light Animal Detection Pipeline

"Computer vision system for wildlife detection in difficult lighting environments."

Developed a computer vision pipeline capable of detecting wildlife in low-visibility scenarios using image enhancement and object detection techniques. Explored preprocessing optimization for real-time performance under challenging environmental conditions.

  • Low-light image enhancement as preprocessing stage
  • Detection reliability scoring under varying conditions
  • Preprocessing optimization for real-time performance
  • Evaluation against real-world environmental conditions
πŸ’‘ Side Project

Semantic ATS Skill Matching Engine

"AI-powered skill extraction and semantic matching for hiring workflows."

A semantic ATS (Applicant Tracking System) software developed using Agentic AI workflows powered through n8n automation. The platform is designed to intelligently analyze resumes, understand candidate skills semantically, match applicants with job requirements, and automate hiring workflows in real time. By leveraging AI-driven decision making and contextual understanding, the system streamlines recruitment operations and improves candidate-job matching accuracy.

  • Semantic resume parsing and intelligent candidate-job matching
  • Agentic AI workflow orchestration using n8n automation
  • Automated recruitment pipeline and candidate screening
  • Context-aware skill evaluation using vector embeddings
  • AI-assisted hiring insights and workflow optimization

n8nAgentic AI

Pushing Systems Beyond the Tutorial

Exploring experimental AI system architectures, inference optimization techniques, and efficient model specialization beyond standard application development.

01

Neuron Specialization Analysis

Activation Patterns Sparse Routing Transformer Internals

Studying activation clustering and sparse routing ideas within transformer layers. Investigating specialization patterns that emerge naturally during training and how they can be exploited for more efficient computation.

02

Efficient Inference Exploration

KV Cache Selective Attention Compute Reduction

Exploring KV cache optimization strategies, selective computation methods, and various compute reduction techniques to enable faster, more memory-efficient inference on consumer hardware.

03

Dynamic Expert Routing

MoE Adaptive Systems Hierarchical Execution

Designing modular activation systems with adaptive routing logic. Exploring hierarchical execution concepts where different subnetworks handle different complexity levels of tasks.

04

Agent Memory Architectures

Long-term Memory RAG Multi-Agent Sync

Investigating long-term memory mechanisms for autonomous agents. Researching retrieval-based context management and multi-agent synchronization protocols for persistent collaborative systems.

⬑ Recent Experiment Logs

2026-05-10
Sparse MoE routing on Llama-3-8B β€” 47% FLOPs reduction with <3% quality drop
2026-04-22
KV cache quantization experiments: INT8 maintains 99.1% retrieval accuracy
2026-03-18
LoRA rank ablation study: r=8 optimal for domain adaptation on limited data
Currently documenting benchmark results for publication β€” inference optimization paper in progress.

AI-Powered Skill Intelligence Platform

πŸ“„
Resumes & Profiles
β†’
🧠
Embedding Engine
β†’
πŸ”—
Vector Skill Graph
β†’
βš–οΈ
Somatic-Weight Scoring
β†’
🎯
Intelligent Matching

The Problem

Traditional ATS systems rely on keyword matching β€” fundamentally broken for identifying real skill relationships. A backend engineer who knows "distributed systems" should surface for "microservices architecture" roles, but current systems miss these connections entirely.

The Approach

Building an AI-powered platform using vector database semantic matching, embedding-based resume analysis, and contextual skill relationship mapping inspired by somatic-weight principles. The system doesn't just match keywords β€” it understands the relationships between skills and maps candidate capabilities to role requirements at a semantic level.

πŸ”¬
Vector Skill Graph Multi-dimensional skill representation with relationship edges
🧬
Somatic-Weight Scoring Contextual skill importance derived from relationship patterns
⚑
Real-time ATS Optimization Intelligent candidate ranking with explainable scoring
πŸ€–
Bidirectional Intelligence Recruiter insights + candidate experience optimization
Vector Databases Embedding Models Graph Neural Networks LangChain FastAPI PostgreSQL + pgvector

From Curiosity to Systems Engineering

The evolution from exploring Python to engineering production-grade AI infrastructure β€” told as it unfolded.

2023
The Spark

Everything started with curiosity. I picked up Python not because I had a plan, but because everyone said it was the language to learn. I started with basics β€” variables, loops, functions β€” then quickly fell into the rabbit hole of backend systems and APIs.

The first "hello world" API I built with Flask felt like magic. I had no idea what I was doing, but it worked.

Early experimentation with AI tooling began here β€” simple scripts, basic ML models, and a growing obsession with understanding how things actually work under the hood.

PythonFlaskAPIsBasic ML
2024
Building Production Systems

This is when things shifted from hobby to serious engineering. I built my first production-ready RAG pipeline, deployed workflow automation with n8n, and integrated various AI APIs into real applications.

First experiments with LangChain, ChromaDB, and local LLM inference via Ollama. I remember the excitement of running a 7B model locally for the first time β€” it felt like unlocking a superpower.

Key realization: The gap between "AI demos" and "AI systems" is enormous. Most tutorials stop at the demo. I wanted to build what comes after.

LangChainChromaDBOllaman8nFastAPI
2025
Deep Dive into AI Systems

I went deep. LoRA fine-tuning β€” understanding rank selection, placement strategies, and when parameter-efficient methods break down. Multi-agent orchestration β€” building systems where multiple LLM agents coordinate on complex tasks.

Optimized local inference pipelines, experimented with MLX on Apple Silicon, and began researching efficient attention mechanisms. Started documenting everything β€” not just the wins, but the failures and dead ends that taught me the most.

Built a multi-agent coding system from scratch. It was buggy, slow, and frustrating. I learned more from that project than from any tutorial.

LoRA/QLoRAMLXAgent SystemsInference Opt.RAG
2024-25
The Pivot That Didn't Happen

I once tried to build a startup around an AI-powered hiring intelligence platform. The vision was compelling β€” semantic skill matching, contextual relationship scoring, and an ATS that actually understood candidates beyond keywords.

We prototyped the core engine β€” vector-based skill graphs, embedding-driven resume analysis, early experiments with graph neural networks for skill relationship mapping. The tech worked. The GTM didn't.

Sometimes the best engineering decisions are the ones you don't ship. I learned more about product-market fit, team dynamics, and when to walk away from this project than from any that succeeded.

It's shelved for now, not abandoned. The research lives on in this portfolio β€” as a startup concept I revisit when the timing is right.

Vector DBsEmbeddingsGraph MLATS Systemsn8n
2026
Research & Infrastructure at the Edge

Current focus: pushing AI systems to their limits. Sparse expert routing for 47% FLOPs reduction. KV cache quantization maintaining 99.1% accuracy at INT8. MoE architectures designed for edge deployment.

I'm building infrastructure that bridges the gap between research papers and production systems β€” taking experimental architectures and making them actually work at scale.

The goal isn't just to use AI. It's to understand it deeply enough to build what doesn't exist yet.

MoESparse RoutingKV CacheEdge AIInfra
Beyond
What's Next

I'm exploring the convergence of efficient inference, agent memory architectures, and AI-driven developer tools. The future isn't just bigger models β€” it's smarter systems that use intelligence efficiently.

Always open to research collaborations, ambitious projects, and conversations about where AI infrastructure is heading.

If you're working on something interesting in AI systems, I'd love to hear about it.

Open to Ideas

Tools of the Trade

A curated stack focused on building, optimizing, and deploying AI systems from the ground up.

🧠
AI / Machine Learning
12 tools
PyTorch Transformers LangChain LangGraph Ollama MLX ChromaDB LoRA QLoRA RAG Semantic Search Vector DBs
PyTorchAdvanced
LLM Fine-TuningIntermediate
RAG ArchitecturesAdvanced
Agent SystemsIntermediate
βš™οΈ
Backend & Infrastructure
9 tools
Python FastAPI REST APIs gRPC Docker PostgreSQL Redis Nginx Linux CI/CD
PythonAdvanced
FastAPI / APIsAdvanced
Docker / DevOpsIntermediate
πŸ’»
Frontend & UI
7 tools
Next.js TypeScript React TailwindCSS Framer Motion Three.js HTML5 / CSS3
Next.js / ReactAdvanced
TypeScriptAdvanced
UI/UX DesignIntermediate
⚑
Automation & Tools
5 tools
n8n GitHub Actions Cloudflare Vercel Workflow Systems AI Pipelines

Certifications

Verified certifications across AI, ML, and cloud platforms.

Let's Build Something Ambitious

I'm always interested in connecting with people who share my passion for building intelligent systems β€” whether it's AI engineering internships, ambitious AI projects, research collaborations, or experimental infrastructure work.