10. Machine Learning

The Data Plane Development Kit (DPDK) is an open-source software project managed by the Linux Foundation. It is designed to offload TCP packet processing from the operating system kernel to user-space processes, thereby enhancing computing efficiency and packet throughput.

The dpdk-test-mldev tool is a DPDK application designed to test various machine learning (mldev) use cases. As part of the DAO package, it provides a way for users to run inference operations with specific inputs. The current DAO release provides two models Resnet50 and LUCID , which can be used for running inference operations.

Note

For detailed documentation related to dpdk-test-mldev, refer to documentation

10.1. Resnet50

10.1.1. Introduction

ResNet50 is a deep learning model used for image classification. It uses residual blocks to improve training efficiency and accuracy. This model is widely used for recognizing and categorizing objects in images. The release includes int8 ( resnet50_int8_t08_b01 ) and fp16 ( resnet50_fp16_t08_b01 ) quantized versions of Resnet50 model, which are optimized for running inference operations.

10.1.2. Preprocessing of Input

Involves converting the input image format to a binary format that the model can accept as an input. This is done using the image2bin.py Python script.

# Convert input in Image format to binary format
python image2bin.py \
--image_file input.jpeg \
--bin_file output.bin

10.1.3. Model Execution

The preprocessed binary is given as an input to the model. The model processes the input, runs the inference operation, and generates the output in binary format.

# Run inferences with dpdk-test-mldev application
dpdk-test-mldev --lcores=4-23 -a 0000:00:10.0,fw_path=/lib/firmware/mlip-fw.bin -- \
--test inference_ordered \
--filelist model.tar,input.bin,output.bin,reference.bin \
--tolerance 5 \
--stats \
--repetitions 1000

10.1.4. Postprocessing of Output

The binary output generated by the model is converted into a JSON file for easier interpretation and analysis. This is done using the bin2json.py Python script.

# Convert output in binary format to JSON format
python bin2json.py \
--bin_file output.bin \
--json_file output.json

10.2. LUCID

10.2.1. Introduction

LUCID (Lightweight, Usable CNN in DDoS Detection) is a deep learning framework designed to detect DDoS attacks. It utilizes Convolutional Neural Networks (CNNs) to effectively distinguish between malicious and benign traffic flows. The release includes int8 ( 10t-10n-lucid_int8_t08_b01 ) and fp16 ( 10t-10n-lucid_fp16_t08_b01 ) quantized versions of LUCID model, which are optimized for running inference operations.

10.2.2. Training Model

The models were trained on the CIC-DDoS-2019 dataset and compiled using the TVM compiler with INT8 and FP16 quantization to generate model binaries for the MLIP target architecture, as part of the DAO release. There are two model binaries available for running inference operations.

Hyperparameters used for training:

Maximum number of packets/sample (n): 10

Time window (t): 10 seconds

Note

For further information on training, please refer to the References section below.

10.2.3. Run Script

As part of the DAO release, we are providing a lucid_run.py script to test the models with any pcap dataset. The script processes the pcap file, runs inference using the specified model, and generates the output.

Command Line Options:

To run inference using the lucid_run.py script with a sample dataset, use the following command line options .

python lucid_run.py [-h, --help]
                    -pl PCAP_FILE, --pcap_file PCAP_FILE
                    -m MODEL, --model MODEL
                    [-y DATASET_TYPE, --dataset_type DATASET_TYPE]

Descriptions:

-h, --help: Display this help message and exit.

-pl PCAP_FILE, --pcap_file PCAP_FILE: Perform a prediction on a pcap file. Follow this option with a pcap file path (e.g., /path/to/traffic_dataset.pcap).

-m MODEL, --model MODEL: Specify the model file for prediction. The model should be a trained model in binary format.

-y DATASET_TYPE, --dataset_type DATASET_TYPE: Choose the dataset type. Options are DOS2017, DOS2018, DOS2019, SYN2020. This is used to generate classification statistics (e.g., accuracy, F1 score) by comparing the ground truth labels with LUCID’s output.

Confusion Matrix is printed in the following format:

TP

FN

FP

TN

Example Run:

This example demonstrates how to predict network traffic from the CIC-DDoS-2019-DNS.pcap file using the 10t-10n-lucid_fp16_t08_b01.bin model:

python lucid_run.py \
    --predict_live CIC-DDoS-2019-DNS.pcap \
    --model 10t-10n-lucid_fp16_t08_b01.bin

10.2.4. References

[1] LUCID repository on GitHub

[2] R. Doriguzzi-Corin, S. Millar, S. Scott-Hayward, J. Martínez-del-Rincón, and D. Siracusa, “Lucid: A Practical, Lightweight Deep Learning Solution for DDoS Attack Detection,” IEEE Transactions on Network and Service Management, vol. 17, no. 2, pp. 876-889, June 2020. doi: 10.1109/TNSM.2020.2971776. Available: IEEE Xplore

[3] CIC-DDoS-2019 dataset