Istanbul Technical University
ABSTRACT: Deep learning architectures like Transformers and Convolutional Neural Networks (CNNs) have led to ground breaking advances across numerous fields. However, their extensive need for parameters poses challenges for implementation in environments with limited resources. In our research, we propose a strategy that focuses on the utilization of the column and row spaces of weight matrices, significantly reducing the number of required model parameters without substantially affecting performance. This technique is applied to both Bottleneck and Attention layers, achieving a notable reduction in parameters with minimal impact on model efficacy. Our proposed model, HaLViT, exemplifies a parameter-efficient Vision Transformer. Through rigorous experiments on the ImageNet dataset and COCO dataset, HaLViT’s performance validates the effectiveness of our method, offering results comparable to those of conventional models.
This work was supported by the Scientific and Technological Research Council of Türkiye (TUBITAK) with 1515 Frontier R&D Laboratories Support Program for BTS Advanced AI Hub: BTS Autonomous Networks and Data Innovation Lab. Project 5239903, with grant number 121E378; partly by the Scientific Research Projects Coordination Department (BAP), Istanbul Technical University, under Projects ITU-BAP MGA-2024-45372 and HIZDEP; and in part by the National Center for High Performance Computing (UHEM) with grant numbers 1016682023 and 4016562023.
Read more here.