assertEqual(IntendedState, ActualState)
Network Unit Testing System (NUTS) allows you to write explicit network tests. I’m going to dynamically generate NUTS tests to check intended state vs actual state!
Problem Statement
Let’s say you have a large network and a traditional network monitoring system that’s checking for things like:
- Interface Status
- Interface Tx/Rx Statistics
- Device Status
- Routing Adjancencies
These kinds of checks are typically configured by enabling some pollers on a tool and then letting the tool discover your inventory and grab the data points. This method is implicit - there’s no explicitly defined network state to check the actual state against. We can augment these checks with explicit intended-vs-actual tests.
For example, in my intended network state, I define the following:
- Network Topology
- Devices
- BGP State (router-id, neighbors, etc)
- OSPF State
- VLANs
- Interface State
And for tests, I want to verify that my actual state matches my intended state:
- Are interfaces configured correctly, connected to the correct devices, and in the correct state?
- Are BGP parameters configured correctly and do I have the correct BGP neighbors? Are the adjacencies in their intended state?
- Are OSPF parameters configured correctly and do I have the correct OSPF neighbors? Are the adjacencies in their intended state?
- Do the VLANs configured match the intended VLANs? None missing, none extra?
- Are network endpoints reachable?
- Do correct VRFs exist?
NUTS can provide these tests and many more.
Network Unit Testing System (NUTS)
NUTS is a custom pytest plugin that uses nornir to interact with devices and run operational network state tests. There are plenty of test bundles provided, or you can create your own.
A Practical Workflow
I’m using Nautobot as my Source of Truth and intended state database for this exercise. I can build a workflow with Nautobot and some apps from Nautobot’s ecosystem. Something like this:
- Run a Nautobot Job that builds pre-validated network designs - allocate IPs, create devices, VLANs, cables, etc.
- Run custom validations with the Nautobot Data Validation Engine for extra assurance that my intended network design is sane.
- Get config backups, generate intended configs, and generate configuration compliance status with the Golden Config App.
- Using Golden Config, remediate any non-compliant configurations.
- Dynamically generate and run NUTS network tests based on my intended state data.
- Investigate/remediate any failed tests.
- Repeat the workflow until all configs are compliant and the NUTS tests pass.
It would be amazing to fit this workflow into a CI pipeline. Perhaps an exercise for another day.
Running the Tests
In my development enviroment I have a script that dynamically generates my tests based off the intended state.
Here are snippets of the final test file:
- test_class: TestNapalmInterfaces
test_data:
- host: dc1-leaf-1
name: Ethernet1
is_enabled: true
is_up: true
mtu: 1500
- host: dc1-leaf-1
name: Ethernet2
is_enabled: true
is_up: true
mtu: 1500
...
- test_class: TestNapalmLldpNeighbors
test_data:
- host: dc1-leaf-1
local_port: Ethernet1
remote_host: dc1-spine-1
remote_port: Ethernet1
- host: dc1-leaf-1
local_port: Ethernet2
remote_host: dc1-spine-2
remote_port: Ethernet1
...
- test_class: TestNapalmLldpNeighborsCount
test_data:
- host: dc1-leaf-1
neighbor_count: 3
- host: dc1-leaf-2
neighbor_count: 3
...
- test_class: TestNetmikoOspfNeighbors
test_data:
- host: dc1-leaf-1
neighbor_id: 10.0.0.1
state: FULL/BDR
- host: dc1-leaf-1
neighbor_id: 10.0.0.2
state: FULL/BDR
...
- test_class: TestNapalmBgpNeighbors
test_data:
- host: dc1-leaf-1
local_id: 10.0.0.5
local_as: 65000
peer: 10.0.0.1
remote_as: 65000
remote_id: 10.0.0.1
is_enabled: true
is_up: true
- host: dc1-leaf-1
local_id: 10.0.0.5
local_as: 65000
peer: 10.0.0.2
remote_as: 65000
remote_id: 10.0.0.2
is_enabled: true
is_up: true
...
- test_class: TestNapalmBgpNeighborsCount
test_data:
- host: dc1-leaf-1
neighbor_count: 2
- host: dc1-leaf-2
neighbor_count: 2
...
The final test count is over 160 tests! The test_data
in each test class dictates which tests will run. See the docs for test construction details.
At a high level, I’m running several tests related to interfaces, BGP, and OSPF state. My intended state was translated into NUTS tests, and the tests will validate that the actual state matches my intended state.
There are many more tests available!
Finally, here are the test results:
root@1ad25e8b548f:/working_dir# python generate_tests.py ; pytest tests/test-definition-dynamic.yaml
============================================================================================== test session starts ==============================================================================================
platform linux -- Python 3.9.18, pytest-7.4.3, pluggy-1.3.0
rootdir: /working_dir
plugins: anyio-4.2.0, nuts-3.2.0
collected 214 items
tests/test-definition-dynamic.yaml ........................ssssssssssss............ssssssssssss.........................................ssssssssssssssssssssssss......................................... [ 77%]
................................................ [100%]
======================================================================================= 166 passed, 48 skipped in 24.94s ========================================================================================
Some tests were skipped because I didn’t configure them, but all 166 tests that ran were successful!
Good network testing is hard, but NUTS helps make it easier. Out of the box I can translate a solid majority of my intended state into actual state tests. In a prod deployment, I could add more intended state context to fine tune the tests. For example, generating tests for a L2 site would be VLAN/Interface oriented rather than BGP oriented. That info would be derived from intended state context (site type, device roles, etc). As for presentation, one could explore collecting the test results and presenting them via Grafana. Maybe I’ll do that in another post.
NUTS Development Environment
This section just gives insight into my development environment used to generate and run the NUTS tests. Enjoy!
In my development environment I have a containerized instance of Nautobot, Containerlab for the virtual network (running cEOS images), and a container to run my NUTS tooling. I’m working on open-sourcing my dev environment, but for now I’ll just describe it to you.
Containerlab Config
I’m using cEOS images for my containerlab topology. Here’s my topology file:
---
name: leafspine
mgmt:
network: custom_mgmt
ipv4-subnet: 172.100.100.0/24
topology:
kinds:
ceos:
image: ceos:4.28.8M
nodes:
dc1-spine-1:
kind: ceos
mgmt-ipv4: 172.100.100.2
dc1-spine-2:
kind: ceos
mgmt-ipv4: 172.100.100.3
dc1-leaf-1:
kind: ceos
mgmt-ipv4: 172.100.100.10
dc1-leaf-2:
kind: ceos
mgmt-ipv4: 172.100.100.11
dc1-leaf-3:
kind: ceos
mgmt-ipv4: 172.100.100.12
links:
- endpoints: ["dc1-spine-1:eth1", "dc1-leaf-1:eth1"]
- endpoints: ["dc1-spine-1:eth2", "dc1-leaf-2:eth1"]
- endpoints: ["dc1-spine-1:eth3", "dc1-leaf-3:eth1"]
- endpoints: ["dc1-spine-2:eth1", "dc1-leaf-1:eth2"]
- endpoints: ["dc1-spine-2:eth2", "dc1-leaf-2:eth2"]
- endpoints: ["dc1-spine-2:eth3", "dc1-leaf-3:eth2"]
Notice the custom subnet settings. I like to attach all my dev containers to this subnet and statically assign IPs.
Nautobot Config
My Nautobot intance uses core data models for a 5 node leaf/spine network. The routing configurations are stored in config contexts and are based on device roles. Sample config context for spine
switches:
router_bgp:
cluster_id: 10.0.0.1
local_asn: '65000'
underlay_neighbors:
- address: 10.0.0.5
description: dc1-leaf-1
- address: 10.0.0.6
description: dc1-leaf-2
- address: 10.0.0.7
description: dc1-leaf-3
NUTS Container
My dev environment is 100% containerized. Josh V makes a good case for this strategy.
Here’s my NUTS Dockerfile:
FROM python:3.9
RUN apt-get update && apt-get install -y \
software-properties-common \
iputils-ping \
openssh-client
ENV NAUTOBOT_URL="http://172.100.100.101:8080"
ENV NAUTOBOT_TOKEN="0123456789abcdef0123456789abcdef01234567"
ENV NORNIR_USERNAME="admin"
ENV NORNIR_PASSWORD="admin"
RUN pip install \
nuts \
nornir-nautobot \
ntc-templates \
pynautobot \
ipython
This is the script I use to start the container:
#!/bin/bash
WORKDIR=$(pwd)/nuts/working_dir/
docker run -it --rm --privileged \
--network custom_mgmt \
--ip 172.100.100.225 \
-w "/working_dir" \
-v $WORKDIR:"/working_dir" \
nuts bash
Since NUTS uses a standard nornir config file we can use the NautobotInventory plugin to dynamically grab our device inventory:
inventory:
plugin: NautobotInventory
options:
# ENV vars will be loaded for token/url
nautobot_url:
nautobot_token:
ssl_verify: false
# ENV vars will be loaded for username/password
transform_function: "load_credentials"
runner:
plugin: threaded
options:
num_workers: 20
I’m using a simple pynautobot script to poll Nautobot’s GraphQL endpoint to retrieve all the data I need about the intended state to generate the NUTS tests. The query structure can be seen in the pynautobot script. The script then renders the tests dynamically using the jinja2 file test-template.j2
.
"""Simple module that dynamically generates network tests."""
import os
from jinja2 import Environment, FileSystemLoader
from pprint import pprint
from pynautobot import api
url = os.environ["NAUTOBOT_URL"]
token = os.environ["NAUTOBOT_TOKEN"]
query = """
{
devices {
location {
name
vlans {
vid
}
}
config_context
hostname: name
role {
name
}
interfaces {
description
enabled
name
mode
mtu
type
cable_peer_interface {
name
device {
name
}
}
lag {
name
}
ip_addresses {
address
ip_version
}
tagged_vlans {
name
vid
}
untagged_vlan {
name
vid
}
cable {
termination_a_type
status {
name
}
color
}
tags {
name
}
}
}
}
"""
nautobot = api(url=url, token=token)
graphql_response = nautobot.graphql.query(query=query)
env = Environment(loader=FileSystemLoader(""), trim_blocks=True, lstrip_blocks=True)
templates = env.get_template("test-template.j2")
output = templates.render(graphql_response.json)
with open("tests/test-definition-dynamic.yaml", "w") as f:
f.write(output)
test-template.j2
:
---
- test_class: TestNapalmInterfaces
test_data:
{% for device in data['devices'] %}
{% for intf in device['interfaces'] %}
{% if intf['cable_peer_interface'] %}
- host: {{ device['hostname'] }}
name: {{ intf['name'] }}
is_enabled: true
is_up: true
mtu: 1500
{% endif %}
{% endfor %}
{% endfor %}
- test_class: TestNapalmLldpNeighbors
test_data:
{% for device in data['devices'] %}
{% for intf in device['interfaces'] %}
{% if intf['cable_peer_interface'] %}
- host: {{ device['hostname'] }}
local_port: {{ intf['name'] }}
remote_host: {{ intf['cable_peer_interface']['device']['name'] }}
remote_port: {{ intf['cable_peer_interface']['name'] }}
{% endif %}
{% endfor %}
{% endfor %}
- test_class: TestNapalmLldpNeighborsCount
test_data:
{% for device in data['devices'] %}
- host: {{ device['hostname'] }}
neighbor_count: {{ device['interfaces'] | selectattr('cable_peer_interface') | list | length + 1 }}
{% endfor %}
- test_class: TestNetmikoOspfNeighbors
test_data:
{% for device in data['devices'] %}
{% for neighbor in device['config_context']['router_bgp']['underlay_neighbors'] %}
- host: {{ device['hostname'] }}
neighbor_id: {{ neighbor['address'] }}
{% if 'spine' in device['hostname'] %}
state: FULL/DR
{% else %}
state: FULL/BDR
{% endif %}
{% endfor %}
{% endfor %}
- test_class: TestNapalmBgpNeighbors
test_data:
{% for device in data['devices'] %}
{% for neighbor in device['config_context']['router_bgp']['underlay_neighbors'] %}
- host: {{ device['hostname'] }}
local_id: {{ device['interfaces'] | selectattr('name', 'in', 'Loopback0') | map(attribute='ip_addresses') | first | map(attribute='address') | first | replace('/32', '') }}
local_as: {{ device['config_context']['router_bgp']['local_asn'] }}
peer: {{ neighbor['address'] }}
remote_as: {{ device['config_context']['router_bgp']['local_asn'] }}
remote_id: {{ neighbor['address'] }}
is_enabled: true
is_up: true
{% endfor %}
{% endfor %}
- test_class: TestNapalmBgpNeighborsCount
test_data:
{% for device in data['devices'] %}
- host: {{ device['hostname'] }}
neighbor_count: {{ device['config_context']['router_bgp']['underlay_neighbors'] | length }}
{% endfor %}
And finally, the resulting tests.