Research on Encrypted Traffic Classification and Sparse Traffic Recognition Based on Feature Extraction and Deep Learning

Ruiya Qi; Junxi Wang; Chengyi Chen; Fangyu Yang

doi:10.62177/apemr.v2i5.632

Authors

Ruiya Qi Information Engineering College Minzu University of China
Junxi Wang Information Engineering College Minzu University of China
Chengyi Chen Science College Minzu University of China
Fangyu Yang Information Engineering College Minzu University of China

DOI:

https://doi.org/10.62177/apemr.v2i5.632

Keywords:

Encrypted Traffic Classification, Protocol Feature Matrix, Clustering Analysis (KMeans/GMM), Random Forest Feature Scoring, Multi-layer Perceptron (MLP), Transformer Architecture

Abstract

This study investigates the classification of encrypted network traffic and proposes a feature extraction and deep learning-based classification model. To address the challenge of feature extraction for encrypted traffic, we adopt a session-level feature extraction method from an information theory perspective. By analyzing statistical, temporal, spatial, and semantic features combined with protocol feature matrices, we reveal fundamental differences in entropy, periodicity, and hierarchical structure among various traffic types. Through cluster analysis and random forest feature scoring mechanisms, we identify key features and perform evaluation and ranking. In model construction, a multi-layer perceptron (MLP) classification model combining ReLU activation and Dropout regularization achieves 95.94% accuracy on the test set. To tackle sample imbalance, we propose an integrated learning approach combining ADASYN over-sampling, focal loss function, and Transformer architecture, which enhances sparse traffic recognition accuracy to 97.22%. The model successfully detected 39 sparse samples, with recall rate for category 9 (vpn_icq_chat1a) increasing from 24.6% to 92.3%. The study demonstrates the model's superiority in feature robustness (maintaining over 90% accuracy at 0.3 noise intensity), computational efficiency (single-sample prediction <1ms), and interpretability (quantified contribution of core features). This provides a theoretically robust and practically valuable solution for encrypted traffic analysis, offering valuable references for future research.

Downloads

Download data is not yet available.

References

Chen, Q. (2024). Research on encrypted traffic detection method based on feature fusion [Master’s thesis, Beijing Jiaotong University].

Li, B. (2025). Automatic detection method for encrypted malicious traffic based on gradient-boosted decision trees. Computer Engineering and Applications. Advance online publication.

Wang, Y., Wang, G., Gao, Y. P., & Huo, Y. (2025). A review of encrypted traffic classification based on deep learning. Computer Engineering and Applications. Advance online publication.