10. VirtIO Net Library

VirtIO-net library is the virtualization solution used in CN10K for networking. This model emulates SMART NICs for VM and front end virtio network driver.

10.1. Features

Currently, VirtIO emulation device supports VirtIO 1.2 specification, where it offers below features.

10.1.1. VirtIO Common Feature Bits:

  • VIRTIO_F_RING_PACKED

  • VIRTIO_F_VERSION_1

  • VIRTIO_F_ANY_LAYOUT

  • VIRTIO_F_IN_ORDER

  • VIRTIO_F_ORDER_PLATFORM

  • VIRTIO_F_NOTIFICATION_DATA

10.1.2. VirtIO-net Feature Bits:

  • VIRTIO_NET_F_CTRL_VQ

  • VIRTIO_NET_F_MQ

  • VIRTIO_NET_F_RSS

  • VIRTIO_NET_F_CTRL_RX

  • VIRTIO_NET_F_STATUS

  • VIRTIO_NET_F_MAC

  • VIRTIO_NET_F_MRG_RXBUF

  • VIRTIO_NET_F_CTRL_MAC_ADDR

  • VIRTIO_NET_F_CTRL_VLAN

  • VIRTIO_NET_F_CSUM

  • VIRTIO_NET_F_GUEST_CSUM

  • VIRTIO_NET_F_MTU

  • VIRTIO_NET_F_HASH_REPORT

  • VIRTIO_NET_F_HOST_TSO4

  • VIRTIO_NET_F_HOST_TSO6

  • VIRTIO_NET_F_GUEST_HDRLEN

Here are some notes about VirtIO-net features:

  • Modern devices are supported, legacy devices are not supported.

  • Only Packed virtqueues(VIRTIO_F_RING_PACKED) are supported.

  • Use the buffers in the same order in which they have been made available(VIRTIO_F_IN_ORDER).

  • Expects extra data(besides identifying the virtqueue) in device notifications( VIRTIO_F_NOTIFICATION_DATA). This is a mandatory feature to be enabled by Host/Guest.

  • Using VIRTIO_F_ORDER_PLATFORM is mandatory for proper functioning of smart NIC as it ensures memory ordering between Host and Octeon DPU.

10.2. VirtIO Emulation Architecture Overview

../_images/Virtio_arch.png

This design fosters a scalable architecture with an emulation software core serving two crucial roles: the service core and worker core.

The service core acts as the overseer, managing the virtio queue descriptors exchanged between the host and FW. Its responsibilities include determining the queue depth, tracking head and tail pointers, and marking buffers as available or in-use. Meanwhile, the worker cores leverage the services provided by the service core, effectively facilitating the movement of packets between the host and FW.

../_images/Virtio_net_internal.png

Internally the above figure depicts virtio-net library architecture. Each portion of descriptor area follows several stages of processing one after the other. The process is triggered by notification data update to move the tail/avail index to end of available descriptors. This triggers service core calling dao_virtio_netdev_desc_manage() to initiate a DMA fetch of descriptor data. As depicted in the figure, q->sd_desc_off follows notify_offset, q->sd_mbuf_off in dequeue queue follows q->sd_desc_off while in enqueue queue, q->last_off follows q->sd_desc_off and soon.

Once descriptor data is fetched, the service core updates q->sd_desc_off so that worker cores can process the descriptors and then further initiate Packet data transfer between Host packet buf and Octeon DPDK mbuf memory.

Once the worker cores have consumed a descriptor data, they move q->last_off for dequeue queue(Host Tx queue) or q->sd_mbuf_off for enqueue queue(Host Rx queue) in local data structure so that service core via dao_virtio_netdev_desc_manage() takes that info and pushes descriptor updates from shadow memory back to Host descriptor ring. Basically number of descriptors to mark complete is nothing but the distance between q->compl_off and the next offset i.e q->last_off or q->sd_mbuf_off.

10.3. VirtIO-net device identification

Each virtio net device is designated by a unique device index starts from 0, in all functions. Currently, this library supports maximum of 64 virtio net devices, one to one mapped to a PEM VF. VirtIO devid[11:0] indicates VF number while devid[15:12] indicates PF number within that VF. A virtio device is always connected to Host VF. Host PF doesn’t have a virtio device representation in the library.

10.4. Device initialization

The initialization of each virtio net device includes the following operations:

  • Initialize base virtio device using virtio_dev_init() API, which populates the virtio capabilities to be available to host.

  • Populates default values to struct virtio_net_config.

The dao_virtio_netdev_init() API is used to initialize a VirtIO net device.

int dao_virtio_netdev_init(uint16_t devid, struct dao_virtio_netdev_conf *conf)

The dao_virtio_netdev_conf structure is used to pass the configuration parameters shown below.

struct dao_virtio_netdev_conf {
	/** PEM device ID */
	uint16_t pem_devid;
	/** Config flags */
#define DAO_VIRTIO_NETDEV_EXTBUF DAO_BIT_ULL(0)
	uint16_t flags;
	union {
		struct {
			/** Default dequeue mempool */
			struct rte_mempool *pool;
		};
		/** Valid when DOS_VIRTIO_NETDEV_EXTBUF is set in flags */
		struct {
			uint16_t dataroom_size;
		};
	};
	/** Vchan to use for this virtio dev */
	uint16_t dma_vchan;
	/** Max virt_queue pairs limit */
	uint16_t max_virt_qps_limit;
	/** Auto free enabled/disabled */
	bool auto_free_en;
	/** RETA size supported */
	uint16_t reta_size;
	/** HASH key size supported */
	uint16_t hash_key_size;
	/** Default MTU */
	uint16_t mtu;
	/** Default MAC address */
	uint8_t mac[VIRTIO_NET_ETHER_ADDR_LEN];
	/** Link info */
	struct dao_virtio_netdev_link_info link_info;
	/** Enable/disable checksum offload feature */
	bool csum_en;
};

The application virtio-l2fwd is a sample application that shows how to use virtio net library.

Sample code to set dao_virtio_netdev_conf parameters:

/* Populate netdev conf */
memset(&netdev_conf, 0, sizeof(netdev_conf));
netdev_conf.auto_free_en = virtio_netdev_autofree;
netdev_conf.pem_devid = pem_devid;
netdev_conf.pool = per_port_pool ? v_pktmbuf_pool[virtio_devid] : v_pktmbuf_pool[0];
netdev_conf.dma_vchan = virtio_netdev_dma_vchans[virtio_devid];
netdev_conf.mtu = 0;
if (virtio_map[virtio_devid].type == ETHDEV_NEXT) {
        struct rte_eth_link eth_link;

        portid = virtio_map[virtio_devid].id;
        netdev_conf.reta_size = eth_dev_info[portid].reta_size;
        netdev_conf.hash_key_size = eth_dev_info[portid].hash_key_size;
        overhd = eth_dev_get_overhead_len(eth_dev_info[portid].max_rx_pktlen,
                                          eth_dev_info[portid].max_mtu);
        rte_eth_link_get(portid, &eth_link);
        netdev_conf.link_info.status = eth_link.link_status;
        netdev_conf.link_info.speed = eth_link.link_speed;
        netdev_conf.link_info.duplex = eth_link.link_duplex;
        /* Register link status change interrupt callback */
        rte_eth_dev_callback_register(portid, RTE_ETH_EVENT_INTR_LSC,
                                      lsc_event_callback,
                                      (void *)(uint64_t)virtio_devid);

        /* Populate default mac address */
        rte_eth_macaddr_get(portid, (struct rte_ether_addr *)netdev_conf.mac);
} else {
        netdev_conf.reta_size = 128;
        netdev_conf.hash_key_size = 48;
        /* Link status always UP */
        netdev_conf.link_info.status = 0x1;
        netdev_conf.link_info.speed = RTE_ETH_SPEED_NUM_UNKNOWN;
        netdev_conf.link_info.duplex = 0xFF;
}

if (max_pkt_len)
        netdev_conf.mtu = (max_pkt_len - overhd);
netdev_conf.auto_free_en = virtio_netdev_autofree;

/* Save reta size for future use */
virtio_netdev_reta_sz[virtio_devid] = netdev_conf.reta_size;

/* Initialize virtio net device */
rc = dao_virtio_netdev_init(virtio_devid, &netdev_conf);
if (rc)
        rte_exit(EXIT_FAILURE, "Failed to init virtio device\n");

10.5. User callback APIs

The application is expected to register callbacks to take the appropriate actions for each control command. VirtIO net library triggers the corresponding callback function when it receives the control command.

The API dao_virtio_netdev_cb_register is used to register the user callback APIs.

void dao_virtio_netdev_cb_register(struct dao_virtio_netdev_cbs);

The dao_virtio_netdev_cbs structure is used to pass the cbs. The following callbacks can be registered currently,

struct dao_virtio_netdev_cbs {
	/** Device status callback */
	dao_virtio_netdev_status_cb_t status_cb;
	/** RSS setup callback */
	dao_virtio_netdev_rss_cb_t rss_cb;
	/** Promisc mode callback */
	dao_virtio_netdev_promisc_cb_t promisc_cb;
	/** All multi callback */
	dao_virtio_netdev_allmulti_cb_t allmulti_cb;
	/** Mac set callback */
	dao_virtio_netdev_mac_set_cb_t mac_set;
	/** Mac filter callback */
	dao_virtio_netdev_mac_add_cb_t mac_add;
	/** Multi queue configure callback */
	dao_virtio_netdev_mq_cfg_t mq_configure;
	/** VLAN filter add callback */
	dao_virtio_netdev_vlan_t vlan_add;
	/** VLAN filter del callback */
	dao_virtio_netdev_vlan_t vlan_del;
	/** Alloc extbuf */
	dao_virtio_netdev_extbuf_get extbuf_get;
	/** Free extbuf */
	dao_virtio_netdev_extbuf_put extbuf_put;
};

10.6. Queue count

Application is expected to get the active virt queues count using dao_virtio_netdev_queue_count and equally distribute the rx and tx queues among all the subscribed lcores.

10.8. VirtIO descriptors Management API

The virtio net library provides an API for managing virtio descriptors, it does following operations:

  • Determine the number of descriptors available by polling on virt queue notification address.

  • Issue DMA using DPDK DMA library to copy the descriptors to shadow queues.

  • Pre-allocate mbufs for actual packet data. Worker cores checks the shadow queue for the available descriptors and issue DMA for actual packet data using these mbufs.

  • Fetch all DMA completions.

  • Mark used virtio descriptors as used in Host descriptor memory.

The dao_virtio_net_desc_manage() API is used to manage the virtio descriptors. Application is expected to call this from a service core as frequently as possible to shadow descriptors between Host and Octeon memory.

dao_virtio_net_desc_manage(uint16_t dev_id, uint16_t qp_count);

The parameter qp_count specifies the active virt queue pair count. Below is the sample code to get qp_count.

virt_q_count = dao_virtio_netdev_queue_count(virtio_devid);
qp_count = virt_q_count/2;

10.9. Enqueue Burst API

The burst enqueue API, enqueues the packet buffers to the host. It uses virtio net device identifier and host Rx virt queue identifier to schedule the processing on. The nb_mbufs parameter is the number of operations to process which are supplied in the mbufs array of rte_mbuf structures. The enqueue function returns the number of operations it actually enqueued for processing. This API is expected to execute by worker cores.

uint16_t dao_virtio_net_enqueue_burst(uint16_t devid, uint16_t qid,
                                      struct rte_mbuf **mbufs, uint16_t nb_mbufs)

Enqueueing includes following operations:

  • Adds virtio_net_hdr to the supplied mbuf’s packet data.

  • Prepares the descriptors and DMA of descriptors to the host RX queue.

10.10. Dequeue Burst API

The dequeue API, dequeues the packet buffers from the host. It uses the same format as the enqueue API of processed but the nb_mbufs and mbufs parameters are now used to specify the max processed operations the user wishes to retrieve and the location in which to store them. The API call returns the actual number of processed operations returned, this can never be larger than nb_mbufs.

uint16_t dao_virtio_net_dequeue_burst(uint16_t devid, uint16_t qid,
                                      struct rte_mbuf **mbufs, uint16_t nb_mbufs)

Dequeueing includes following operations:

  • Issue DMAs for mbufs.

  • Fetch dma status and update the shadow mbuf offset, so that service core can mark the descriptors as used based on the shadow mbuf offset.