FlashKernel
Custom CUDA C++ and Triton kernels for transformer inference — tiled FlashAttention, fused GeLU, RoPE, paged KV-cache — benchmarked with Nsight Compute on T4.
View details →Building reproducible, benchmarked projects at the intersection of large language models, robotic manipulation, brain–computer interfaces, quantum optimization, and energy systems.
Building at the intersection of six domains.
Deep, reproducible systems at the intersection of LLM, robotics, quantum AI, BCI, and GPU compute.
Custom CUDA C++ and Triton kernels for transformer inference — tiled FlashAttention, fused GeLU, RoPE, paged KV-cache — benchmarked with Nsight Compute on T4.
View details →Language-grounded robotic manipulation — VLM planner decomposes instructions into sub-tasks, RL policies execute each step in MuJoCo simulation.
View details →Foundation model for neural signal decoding — pre-train a transformer on large-scale EEG, fine-tune for motor imagery BCI with frequency-band attention.
View details →Quantum-classical hybrid optimization for energy grids — QAOA and VQE applied to unit commitment on real ENTSO-E data, benchmarked against MILP solvers.
View details →Leading applied ML and AI systems in startups, NGOs, and banking.
Available for collaborations, advisory work, and technical leadership roles.