..  SPDX-License-Identifier: Marvell-MIT
    Copyright (c) 2024 Marvell.

****************
Machine Learning
****************

The Data Plane Development Kit (DPDK) is an open-source software project managed by the Linux Foundation. It is designed to offload TCP packet processing from the operating system kernel to user-space processes, thereby enhancing computing efficiency and packet throughput.

The ``dpdk-test-mldev`` tool is a DPDK application designed to test various machine learning (mldev) use cases. As part of the DAO package, it provides a way for users to run inference operations with specific inputs. The current DAO release provides two models ``Resnet50`` and ``LUCID`` , which can be used for running inference operations.

.. note::
    For detailed documentation related to dpdk-test-mldev, refer to `documentation <https://doc.dpdk.org/guides/tools/testmldev.html>`_

Resnet50
========

Introduction
------------

ResNet50 is a deep learning model used for image classification. It uses residual blocks to improve training efficiency and accuracy. This model is widely used for recognizing and categorizing objects in images. The release includes int8 ( ``resnet50_int8_t08_b01`` ) and fp16 ( ``resnet50_fp16_t08_b01`` ) quantized versions of Resnet50 model, which are optimized for running inference operations.

Preprocessing of Input
----------------------

Involves converting the input image format to a binary format that the model can accept as an input. This is done using the ``image2bin.py`` Python script.

.. code-block:: console

    # Convert input in Image format to binary format
    python image2bin.py \
    --image_file input.jpeg \
    --bin_file output.bin

Model Execution
---------------

The preprocessed binary is given as an input to the model. The model processes the input, runs the inference operation, and generates the output in binary format.

.. code-block:: console

    # Run inferences with dpdk-test-mldev application
    dpdk-test-mldev --lcores=4-23 -a 0000:00:10.0,fw_path=/lib/firmware/mlip-fw.bin -- \
    --test inference_ordered \
    --filelist model.tar,input.bin,output.bin,reference.bin \
    --tolerance 5 \
    --stats \
    --repetitions 1000

Postprocessing of Output
------------------------

The binary output generated by the model is converted into a JSON file for easier interpretation and analysis. This is done using the ``bin2json.py`` Python script.

.. code-block:: console

    # Convert output in binary format to JSON format
    python bin2json.py \
    --bin_file output.bin \
    --json_file output.json

LUCID
=====

Introduction
------------

LUCID (Lightweight, Usable CNN in DDoS Detection) is a deep learning framework designed to detect DDoS attacks. It utilizes Convolutional Neural Networks (CNNs) to effectively distinguish between malicious and benign traffic flows. The release includes int8 ( ``10t-10n-lucid_int8_t08_b01`` ) and fp16 ( ``10t-10n-lucid_fp16_t08_b01`` ) quantized versions of LUCID model, which are optimized for running inference operations.

Training Model
--------------

The models were trained on the `CIC-DDoS-2019 dataset <https://www.unb.ca/cic/datasets/ddos-2019.html>`_ and compiled using the TVM compiler with ``INT8`` and ``FP16`` quantization to generate model binaries for the MLIP target architecture, as part of the DAO release. There are two model binaries available for running inference operations.

**Hyperparameters used for training:**

    * **Maximum number of packets/sample (n)**: 10
    * **Time window (t)**: 10 seconds

.. note::
    For further information on training, please refer to the References section below.

Run Script
----------

As part of the DAO release, we are providing a ``lucid_run.py`` script to test the models with any pcap dataset. The script processes the pcap file, runs inference using the specified model, and generates the output.

**Command Line Options:**

To run inference using the ``lucid_run.py`` script with a sample dataset, use the following command line options .

.. code-block:: console

    python lucid_run.py [-h, --help]
                        -pl PCAP_FILE, --pcap_file PCAP_FILE
                        -m MODEL, --model MODEL
                        [-y DATASET_TYPE, --dataset_type DATASET_TYPE]

**Descriptions:**

    * ``-h, --help:`` Display this help message and exit.
    * ``-pl PCAP_FILE, --pcap_file PCAP_FILE:`` Perform a prediction on a pcap file. Follow this option with a pcap file path (e.g., /path/to/traffic_dataset.pcap).
    * ``-m MODEL, --model MODEL:`` Specify the model file for prediction. The model should be a trained model in binary format.
    * ``-y DATASET_TYPE, --dataset_type DATASET_TYPE:`` Choose the dataset type. Options are DOS2017, DOS2018, DOS2019, SYN2020. This is used to generate classification statistics (e.g., accuracy, F1 score) by comparing the ground truth labels with LUCID's output.

Confusion Matrix is printed in the following format:

    .. list-table::
        :widths: 10 10
        :header-rows: 1

        * - TP
          - FN
        * - FP
          - TN

**Example Run:**

This example demonstrates how to predict network traffic from the ``CIC-DDoS-2019-DNS.pcap`` file using the ``10t-10n-lucid_fp16_t08_b01.bin`` model:

.. code-block:: console

    python lucid_run.py \
        --predict_live CIC-DDoS-2019-DNS.pcap \
        --model 10t-10n-lucid_fp16_t08_b01.bin

References
----------

[1] LUCID repository on `GitHub <https://github.com/doriguzzi/lucid-ddos>`_

[2] R. Doriguzzi-Corin, S. Millar, S. Scott-Hayward, J. Martínez-del-Rincón, and D. Siracusa, "Lucid: A Practical, Lightweight Deep Learning Solution for DDoS Attack Detection," IEEE Transactions on Network and Service Management, vol. 17, no. 2, pp. 876-889, June 2020. doi: 10.1109/TNSM.2020.2971776. Available: `IEEE Xplore <https://ieeexplore.ieee.org/document/8984222>`_

[3] `CIC-DDoS-2019 dataset <https://www.unb.ca/cic/datasets/ddos-2019.html>`_