assertEqual(IntendedState, ActualState)

December 19, 2023

Network Unit Testing System (NUTS) allows you to write explicit network tests. I’m going to dynamically generate NUTS tests to check intended state vs actual state!

Problem Statement

Let’s say you have a large network and a traditional network monitoring system that’s checking for things like:

Interface Status
Interface Tx/Rx Statistics
Device Status
Routing Adjancencies

These kinds of checks are typically configured by enabling some pollers on a tool and then letting the tool discover your inventory and grab the data points. This method is implicit - there’s no explicitly defined network state to check the actual state against. We can augment these checks with explicit intended-vs-actual tests.

For example, in my intended network state, I define the following:

Network Topology
Devices
BGP State (router-id, neighbors, etc)
OSPF State
VLANs
Interface State

And for tests, I want to verify that my actual state matches my intended state:

Are interfaces configured correctly, connected to the correct devices, and in the correct state?
Are BGP parameters configured correctly and do I have the correct BGP neighbors? Are the adjacencies in their intended state?
Are OSPF parameters configured correctly and do I have the correct OSPF neighbors? Are the adjacencies in their intended state?
Do the VLANs configured match the intended VLANs? None missing, none extra?
Are network endpoints reachable?
Do correct VRFs exist?

NUTS can provide these tests and many more.

Network Unit Testing System (NUTS)

NUTS is a custom pytest plugin that uses nornir to interact with devices and run operational network state tests. There are plenty of test bundles provided, or you can create your own.

A Practical Workflow

I’m using Nautobot as my Source of Truth and intended state database for this exercise. I can build a workflow with Nautobot and some apps from Nautobot’s ecosystem. Something like this:

Run a Nautobot Job that builds pre-validated network designs - allocate IPs, create devices, VLANs, cables, etc.
Run custom validations with the Nautobot Data Validation Engine for extra assurance that my intended network design is sane.
Get config backups, generate intended configs, and generate configuration compliance status with the Golden Config App.
Using Golden Config, remediate any non-compliant configurations.
Dynamically generate and run NUTS network tests based on my intended state data.
Investigate/remediate any failed tests.
Repeat the workflow until all configs are compliant and the NUTS tests pass.

It would be amazing to fit this workflow into a CI pipeline. Perhaps an exercise for another day.

Running the Tests

In my development enviroment I have a script that dynamically generates my tests based off the intended state.

Here are snippets of the final test file:

- test_class: TestNapalmInterfaces
  test_data:
    - host: dc1-leaf-1
      name: Ethernet1
      is_enabled: true
      is_up: true
      mtu: 1500
    - host: dc1-leaf-1
      name: Ethernet2
      is_enabled: true
      is_up: true
      mtu: 1500
...
- test_class: TestNapalmLldpNeighbors
  test_data:
    - host: dc1-leaf-1
      local_port: Ethernet1
      remote_host: dc1-spine-1
      remote_port: Ethernet1
    - host: dc1-leaf-1
      local_port: Ethernet2
      remote_host: dc1-spine-2
      remote_port: Ethernet1
...
- test_class: TestNapalmLldpNeighborsCount
  test_data:
    - host: dc1-leaf-1
      neighbor_count: 3
    - host: dc1-leaf-2
      neighbor_count: 3
...
- test_class: TestNetmikoOspfNeighbors
  test_data:
    - host: dc1-leaf-1
      neighbor_id: 10.0.0.1
      state: FULL/BDR
    - host: dc1-leaf-1
      neighbor_id: 10.0.0.2
      state: FULL/BDR
...
- test_class: TestNapalmBgpNeighbors
  test_data:
    - host: dc1-leaf-1
      local_id: 10.0.0.5
      local_as: 65000
      peer: 10.0.0.1
      remote_as: 65000
      remote_id: 10.0.0.1
      is_enabled: true
      is_up: true
    - host: dc1-leaf-1
      local_id: 10.0.0.5
      local_as: 65000
      peer: 10.0.0.2
      remote_as: 65000
      remote_id: 10.0.0.2
      is_enabled: true
      is_up: true
...
- test_class: TestNapalmBgpNeighborsCount
  test_data:
    - host: dc1-leaf-1
      neighbor_count: 2
    - host: dc1-leaf-2
      neighbor_count: 2
...

The final test count is over 160 tests! The test_data in each test class dictates which tests will run. See the docs for test construction details.

At a high level, I’m running several tests related to interfaces, BGP, and OSPF state. My intended state was translated into NUTS tests, and the tests will validate that the actual state matches my intended state.

There are many more tests available!

Finally, here are the test results:

root@1ad25e8b548f:/working_dir# python generate_tests.py ; pytest tests/test-definition-dynamic.yaml
============================================================================================== test session starts ==============================================================================================
platform linux -- Python 3.9.18, pytest-7.4.3, pluggy-1.3.0
rootdir: /working_dir
plugins: anyio-4.2.0, nuts-3.2.0
collected 214 items                                                                                                                                                                                             

tests/test-definition-dynamic.yaml ........................ssssssssssss............ssssssssssss.........................................ssssssssssssssssssssssss......................................... [ 77%]
................................................                                                                                                                                                          [100%]

======================================================================================= 166 passed, 48 skipped in 24.94s ========================================================================================

Some tests were skipped because I didn’t configure them, but all 166 tests that ran were successful!

Good network testing is hard, but NUTS helps make it easier. Out of the box I can translate a solid majority of my intended state into actual state tests. In a prod deployment, I could add more intended state context to fine tune the tests. For example, generating tests for a L2 site would be VLAN/Interface oriented rather than BGP oriented. That info would be derived from intended state context (site type, device roles, etc). As for presentation, one could explore collecting the test results and presenting them via Grafana. Maybe I’ll do that in another post.

NUTS Development Environment

This section just gives insight into my development environment used to generate and run the NUTS tests. Enjoy!

In my development environment I have a containerized instance of Nautobot, Containerlab for the virtual network (running cEOS images), and a container to run my NUTS tooling. I’m working on open-sourcing my dev environment, but for now I’ll just describe it to you.

Containerlab Config

I’m using cEOS images for my containerlab topology. Here’s my topology file:

---

name: leafspine

mgmt:
  network: custom_mgmt
  ipv4-subnet: 172.100.100.0/24 

topology:
  kinds:
    ceos:
      image: ceos:4.28.8M
  nodes:
    dc1-spine-1:
      kind: ceos
      mgmt-ipv4: 172.100.100.2      
    dc1-spine-2:
      kind: ceos
      mgmt-ipv4: 172.100.100.3     
    dc1-leaf-1:
      kind: ceos
      mgmt-ipv4: 172.100.100.10
    dc1-leaf-2:
      kind: ceos
      mgmt-ipv4: 172.100.100.11
    dc1-leaf-3:
      kind: ceos
      mgmt-ipv4: 172.100.100.12

  links:
    - endpoints: ["dc1-spine-1:eth1", "dc1-leaf-1:eth1"]
    - endpoints: ["dc1-spine-1:eth2", "dc1-leaf-2:eth1"]
    - endpoints: ["dc1-spine-1:eth3", "dc1-leaf-3:eth1"]
    - endpoints: ["dc1-spine-2:eth1", "dc1-leaf-1:eth2"]
    - endpoints: ["dc1-spine-2:eth2", "dc1-leaf-2:eth2"]
    - endpoints: ["dc1-spine-2:eth3", "dc1-leaf-3:eth2"]

Notice the custom subnet settings. I like to attach all my dev containers to this subnet and statically assign IPs.

Nautobot Config

My Nautobot intance uses core data models for a 5 node leaf/spine network. The routing configurations are stored in config contexts and are based on device roles. Sample config context for spine switches:

router_bgp:
  cluster_id: 10.0.0.1
  local_asn: '65000'
  underlay_neighbors:
  - address: 10.0.0.5
    description: dc1-leaf-1
  - address: 10.0.0.6
    description: dc1-leaf-2
  - address: 10.0.0.7
    description: dc1-leaf-3

NUTS Container

My dev environment is 100% containerized. Josh V makes a good case for this strategy.

Here’s my NUTS Dockerfile:

FROM python:3.9

RUN apt-get update && apt-get install -y \
    software-properties-common \
    iputils-ping \
    openssh-client

ENV NAUTOBOT_URL="http://172.100.100.101:8080"
ENV NAUTOBOT_TOKEN="0123456789abcdef0123456789abcdef01234567"
ENV NORNIR_USERNAME="admin"
ENV NORNIR_PASSWORD="admin"

RUN pip install \
    nuts \
    nornir-nautobot \
    ntc-templates \
    pynautobot \
    ipython

This is the script I use to start the container:

#!/bin/bash

WORKDIR=$(pwd)/nuts/working_dir/

docker run -it --rm --privileged \
    --network custom_mgmt \
    --ip 172.100.100.225 \
    -w "/working_dir" \
    -v $WORKDIR:"/working_dir" \
    nuts bash

Since NUTS uses a standard nornir config file we can use the NautobotInventory plugin to dynamically grab our device inventory:

inventory:
  plugin: NautobotInventory
  options:
    # ENV vars will be loaded for token/url
    nautobot_url:
    nautobot_token:
    ssl_verify: false
  # ENV vars will be loaded for username/password
  transform_function: "load_credentials"

runner:
  plugin: threaded
  options:
    num_workers: 20

I’m using a simple pynautobot script to poll Nautobot’s GraphQL endpoint to retrieve all the data I need about the intended state to generate the NUTS tests. The query structure can be seen in the pynautobot script. The script then renders the tests dynamically using the jinja2 file test-template.j2.

"""Simple module that dynamically generates network tests."""
import os

from jinja2 import Environment, FileSystemLoader
from pprint import pprint
from pynautobot import api

url = os.environ["NAUTOBOT_URL"]
token = os.environ["NAUTOBOT_TOKEN"]
query = """
{
  devices {
    location {
      name
      vlans {
        vid
      }
    }
    config_context
    hostname: name
    role {
      name
    }
    interfaces {
      description
      enabled
      name
      mode
      mtu
      type
      cable_peer_interface {
        name
        device {
          name
        }
      }
      lag {
        name
      }
      ip_addresses {
        address
        ip_version
      }
      tagged_vlans {
        name
        vid
      }
      untagged_vlan {
        name
        vid
      }
      cable {
        termination_a_type
        status {
          name
        }
        color
      }
      tags {
        name
      }
    }
  }
}
"""

nautobot = api(url=url, token=token)
graphql_response = nautobot.graphql.query(query=query)

env = Environment(loader=FileSystemLoader(""), trim_blocks=True, lstrip_blocks=True)
templates = env.get_template("test-template.j2")

output = templates.render(graphql_response.json)

with open("tests/test-definition-dynamic.yaml", "w") as f:
    f.write(output)

test-template.j2:

---
- test_class: TestNapalmInterfaces
  test_data:
{% for device in data['devices'] %}
  {% for intf in device['interfaces'] %}
    {% if intf['cable_peer_interface'] %}
    - host: {{ device['hostname'] }}
      name: {{ intf['name'] }}
      is_enabled: true
      is_up: true
      mtu: 1500
    {% endif %}
  {% endfor %}
{% endfor %}

- test_class: TestNapalmLldpNeighbors
  test_data:
{% for device in data['devices'] %}
  {% for intf in device['interfaces'] %}
    {% if intf['cable_peer_interface'] %}
    - host: {{ device['hostname'] }}
      local_port: {{ intf['name'] }}
      remote_host: {{ intf['cable_peer_interface']['device']['name'] }}
      remote_port: {{ intf['cable_peer_interface']['name'] }}
    {% endif %}
  {% endfor %}
{% endfor %}

- test_class: TestNapalmLldpNeighborsCount
  test_data:
{% for device in data['devices'] %}
    - host: {{ device['hostname'] }}
      neighbor_count: {{ device['interfaces'] | selectattr('cable_peer_interface') | list | length + 1 }}
{% endfor %}

- test_class: TestNetmikoOspfNeighbors
  test_data:
{% for device in data['devices'] %}
  {% for neighbor in device['config_context']['router_bgp']['underlay_neighbors'] %}
    - host: {{ device['hostname'] }}
      neighbor_id: {{ neighbor['address'] }}
    {% if 'spine' in device['hostname'] %}
      state: FULL/DR
    {% else %}
      state: FULL/BDR
    {% endif %}
  {% endfor %}
{% endfor %}

- test_class: TestNapalmBgpNeighbors
  test_data:
{% for device in data['devices'] %}
  {% for neighbor in device['config_context']['router_bgp']['underlay_neighbors'] %}
    - host: {{ device['hostname'] }}
      local_id: {{ device['interfaces'] | selectattr('name', 'in', 'Loopback0') | map(attribute='ip_addresses') | first | map(attribute='address') | first | replace('/32', '') }}
      local_as: {{ device['config_context']['router_bgp']['local_asn'] }}
      peer: {{ neighbor['address'] }}
      remote_as: {{ device['config_context']['router_bgp']['local_asn'] }}
      remote_id: {{ neighbor['address'] }}
      is_enabled: true
      is_up: true
  {% endfor %}
{% endfor %}

- test_class: TestNapalmBgpNeighborsCount
  test_data:
{% for device in data['devices'] %}
    - host: {{ device['hostname'] }}
      neighbor_count: {{ device['config_context']['router_bgp']['underlay_neighbors'] | length }}
{% endfor %}

And finally, the resulting tests.