MLIR Softmax Backend
An end-to-end compiler backend that takes MLIR softmax input through lowering, NVPTX codegen, PTX emission, and CUDA Driver kernel launch.
What I built
Built the C++17 pipeline, a custom LICM strength-reduction pass, CUDA runtime integration, and FileCheck plus CTest coverage around the full compiler path.
Impact
The optimization replaces N loop divisions with 1 division plus N multiplications when the denominator is loop-invariant, then verifies correctness on GPU.
