Optimized Transformer Inference Engine

C++ CUDA TensorRT FasterTransformer

A high-performance inference engine for transformer models, focusing on latency optimization and throughput maximization for production deployment.

Features

Back View on GitHub