Research

Computer Sciences and Information Technology

Title :	Enhanced Signature Scheme for Malware Analysis and Other Assorted Attacks
Area of research :	Computer Sciences and Information Technology
Focus area :	Cyber Security, Malware Analysis
Principal Investigator :	Dr. Raman Kumar, I.K. Gujral Punjab Technical University, Punjab
Timeline Start Year :	2023
Timeline End Year :	2026
Contact info :	dav.raman@gmail.com
Equipments :	ArsGIS

Details

Executive Summary :

Malicious software has also expanded dramatically alongside the online world. Because most malware is developed by reusing code from existing threats, it follows a familiar pattern. Classifiers can identify this pattern similarity. In this work, 5 features of malware, including API calls, U-API calls, PE imports, Proc memory addresses, and Strings, are extracted and from a data set of 15217 samples from 14 families of malware. In total, three tests were run on this data set to ensure its validity. The first test was conducted with the use of machine learning classifiers such k-NN, Multi NB, Gaussian NB, Decision Tree, and Random Forest. Scikit-learn is a library used to try out different machine learning classifiers. Following a 70:30 split of the dataset, 70% was utilized to train classifiers, while 30% was used for validation. The Random Forest classifier has been found to have higher accuracy than any of the other classifiers tested (96.19% for API calls, 93.43% for U-API calls, 94.42% for Proc memory address, 92.9% for PE imports, and 88.11% for strings). Random Forest achieves a total accuracy of 96.19%, with the API calls feature performing particularly well. . The second experiment used these characteristics as inputs to deep learning models. In addition to traditional features, hybrid features are employed in the application of deep learning. API calls provide the utmost precision for individual features. In this third experiment, two CNN models are combined using API calls and their hybrid features. Grayscale images are generated from API calls. After that, two different CNN models are applied on API calls and aggregate their features to produce a single feature vector. This merged feature vector is then used for training. The proposed work is compared against well-established methods, such as the VGG-16, ResNet-50, and AlexNet, and is found to be superior. In addition to this hybrid feature vector being run on MCBTTL, a similar operation involving API calls, PE Import, and Proc memory address is also being carried out on the platform. The combined feature vector of API Calls, PE Import, and Proc memory address yields the maximum accuracy of 99.92% across all three experiments.

Total Budget (INR):

27,81,240

Organizations involved

Implementing Agency :	I.K. Gujral Punjab Technical University, Punjab
Funding Agency :	Anusandhan National Rsearch Foundation (ANRF)/Science and Engineering Research Board (SERB)
Source:	Anusandhan National Research Foundation/Science and Engineering Research Board (SERB), DST 2023-24