Executive Summary : | This project aims to develop accelerated systems for transformer-based vision tasks, focusing on high performance and energy efficiency while preserving image contextual information. The project plans to design custom hardware with efficient dataflow for parallel processing of ViT loads, load balance aware sparsity exploitation without losing interpretability, reduced redundant computations through approximation, and eliminated off-chip latency/energy overhead by near-memory processing. These capabilities will be implemented in field programmable gate array (FPGA) / application-specific integrated circuit (ASIC) prototypes. The hardware design can be made incremental, with the first version having efficient dataflow and later versions incorporating optimizations. The complexity of the hardware modules will be thoroughly studied, and suitable versions will be integrated with emerging main memory technologies to reduce data movement costs. The goal is to design optimized systems for ViTs that outperform baseline and state-of-the-art works in terms of system throughput while preserving interpretability for computer vision. |