BCI × LLM × GPU

NeuroLLM

A foundation model approach to EEG decoding: pre-train a small transformer on large-scale clinical EEG data, then fine-tune for motor imagery classification with a custom frequency-band attention kernel.

PyTorch · MNE-Python · CUDA · EEG
Brain-computer interfaces traditionally rely on hand-crafted features (CSP, band-power) fed into classical classifiers. Foundation models for neural signals — pre-trained transformers that learn general EEG representations — are an active research frontier (LaBraM, BrainBERT, NeurIPS 2023-2024). NeuroLLM demonstrates that even a small model (~10M params) pre-trained on TUH EEG data can improve over standard BCI baselines, with a custom CUDA kernel for EEG-specific frequency-band attention.
PyTorch MNE-Python CUDA C++ Triton EEG / BCI Transformers

Datasets

DatasetUseSizeDetails
TUH EEG Corpus Pre-training ~25K sessions Clinical EEG, self-supervised masked channel modeling
BCI Competition IV 2a Fine-tuning + eval 9 subjects × 576 trials 4-class motor imagery (left/right hand, feet, tongue)

Technical approach

  • Patch embedding: EEG channels split into 200ms temporal windows → tokens. Learnable channel + temporal position encodings capture electrode topology.
  • Pre-training: Masked Channel Modeling — randomly mask 30% of channel-time patches, reconstruct from context. Self-supervised, no labels needed.
  • Frequency-band attention: Custom attention mechanism that decomposes patches into frequency bands (delta, theta, alpha, beta, gamma) and applies band-specific attention biases. Implemented as a fused CUDA kernel.
  • Fine-tuning: [CLS] token → MLP → 4-class softmax. Session 1 train, Session 2 test (standard BCI protocol).
  • Baselines: CSP+SVM (hand-crafted features), EEGNet (compact CNN), vanilla transformer (same arch, no pre-training).

Benchmarks

Per-subject accuracy across 9 subjects. Session 1 → train, Session 2 → test.

MethodAvg AccuracyCohen's KappaParameters
CSP + SVM~65-70% (lit.)N/A
EEGNet~70-75% (lit.)2.6K
Vanilla Transformer~10M
NeuroLLM (ours)~10M

Results will be populated from real training runs. Attention heatmaps over electrode positions committed to repository.

Architecture diagram

flowchart TB EEG["EEG Signal
22 channels × 1000 samples"] --> Patch["Patch Embedding
Channel × time → tokens"] Patch --> Pos["Position + Channel Encoding
Electrode topology"] Pos --> Pretrain{"Pre-trained?"} Pretrain -->|Yes| Encoder["Transformer Encoder
6 layers, freq-band attention
Custom CUDA kernel"] Pretrain -->|No| Encoder Encoder --> CLS["[CLS] Token"] CLS --> Head["Classification Head
MLP → 4 classes"] Head --> Output["Motor Imagery Class
Left/Right/Feet/Tongue"] style Encoder fill:#eff6ff,stroke:#2563eb,color:#0f172a style Head fill:#eff6ff,stroke:#2563eb,color:#0f172a