Flagship Projects

Four deep systems across six interests

Each project spans 2–3 domains — LLM, robotics, quantum AI, energy systems, brain-computer interfaces, and GPU compute — with real benchmarks, profiling artifacts, and reproducible code.

Hardware/GPU × LLM

FlashKernel

Custom CUDA C++ and Triton kernels for transformer inference — tiled FlashAttention, fused GeLU+Linear, RoPE, paged KV-cache — benchmarked with Nsight Compute on NVIDIA T4.

CUDA C++ Triton Nsight Compute PyTorch
LLM × Robotics × GPU

RoboLLM

Language-grounded robotic manipulation — a VLM planner decomposes natural language instructions into sub-tasks, and RL-trained policies execute each step in MuJoCo simulation.

MuJoCo PaliGemma-3B SAC/PPO PyTorch
BCI × LLM × GPU

NeuroLLM

Foundation model for neural signal decoding — pre-train a transformer on large-scale EEG data, fine-tune for motor imagery BCI with a custom frequency-band attention kernel.

PyTorch MNE-Python CUDA EEG
Quantum AI × Energy

QuantumGrid

Quantum-classical hybrid optimization for energy grids — QAOA and VQE circuits applied to unit commitment on real ENTSO-E data, benchmarked against classical MILP solvers.

PennyLane QAOA/VQE OR-Tools ENTSO-E