Application of Vision Transformers in Online Advertisement Identification

Show simple item record

dc.contributor.author Liyanage, C.R.
dc.contributor.author Madushika, M.K.S.
dc.contributor.author Nawarathna, R.D.
dc.date.accessioned 2022-04-22T04:00:35Z
dc.date.available 2022-04-22T04:00:35Z
dc.date.issued 2022-03-02
dc.identifier.citation Liyanage, C. R.,Madushika, M. K. S. & Nawarathna, R. D. (2022). Application of Vision Transformers in Online Advertisement Identification. 19th Academic Sessions, University of Ruhuna, Matara, Sri Lanka. 13.
dc.identifier.issn 2362-0412
dc.identifier.uri http://ir.lib.ruh.ac.lk/xmlui/handle/iruor/5709
dc.description.abstract Advertisements(ads) play an important role in many sectors, such as business, education and government as they can influence cultural and religious aspects of a society by disseminating important messages to people. Generally, image-based advertisements are more creative and different from other images as these contain slogans explaining the message of the ad, symbolic and atypical objects and different placements of objects within an image. Identification of advertisements from other images is important on digital media in getting customer attention or blocking them from websites. This study proposes a method to use a supervised learning approach to classify images into ads or not-ads. Another objective of this study is to verify the application of Vision Transformers (ViT) in the domain of image-based ad analysis. ViT is a novel image classification architecture derived similar to the Convolutional Neural Network (CNN), where images are divided into patches and trained using the technique called “Multi- Headed Self Attention”. The experiment was conducted using 19,700 images that were labelled as ad and not-ad. Two ViT models with different patch sizes, which were pre-trained on ImageNet-21K dataset were used for image classification. These two models were trained as batches of size 10 for a maximum of 20 epochs. The dataset was split into two main parts as training and testing and set the validation split as 0.2. The highest accuracy of 82% was gained from the 32x32 patch sized model during validation. Moreover, an accuracy of 84%, precision of 85%, and recall of 84% resulted during its testing phase. The results of this study were compared with the state of the art research using CNN. The study has proved that the ViT architecture can achieve comparative results with the limited available computational resources. en_US
dc.language.iso en en_US
dc.publisher University of Ruhuna, Matara, Sri Lanka en_US
dc.subject Advertisements en_US
dc.subject Classification en_US
dc.subject Vision Transformers en_US
dc.title Application of Vision Transformers in Online Advertisement Identification en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account