Lessons Learned From Working in .edu Networks

A non-comprehensive list of principles that shouldn’t be taken for granted in the campus enterprise networking space.

1. Good Processes Matter

Having defined processes to drive outcomes is paramount to a quality final product. i.e. Building a new site without checklists/verification can lead to all sorts of unexpected outcomes. Construct a template for all routine processes and ensure it’s being followed. You will see improvements in outcome consistency and reduce surprise tasks for team members.

2. Accurate Device Inventory is Important

Have you ever automated a device upgrade across your entire inventory only to discover later that you missed 1/5th of your intended devices? Do you have 5 different systems that allegedly have accurate inventory counts but they all claim different inventory numbers? Accurate device counts tie into good process. Your process should include an intended state definition and an actual state monitor.

  • Intended State
    • With a DCIM tool (Nautobot, Netbox) you can pre-define which devices will be installed. You can also easily pre-define configurations for devices, rack elevations, cable runs, site information, etc. This method is nice because the documentation drives the installation and you always know what’s supposed to exist. Schedule a check for your SNMP/telemetry tooling against your DCIM for validation.
    • Network diagrams (for-construction, as-built) can be tremendously helpful in providing context in an environment, but are basically impossible to use as inventory. Diagrams should be driven by intended state data (I’ve never gotten this far. One day!)
  • Actual State
    • Auto-Discovery (SNMP/telemetry)
      • Decently automatic. Input a supernet, implement auto-grouping, and you’re good to go.
      • This method only works if you know what should be discovered and can routinely verify all intended devices were discovered.
    • Manual Entry (SNMP/telemetry)
      • With good process, this method can be highly reliable. Arduous, yes, but you’ll have accurate inventory most of the time.

3. Simplicity is Your Friend

The last couple network refresh projects I participated in originally had legacy topologies that were difficult to support and required institutional knowledge/reverse engineering for routine maintenance. When possible, you should build networks with well known, stable, and predictable topologies. Some minimum viable configuration (MVC) strategy bits:

  • Don’t configure unnecessary options
  • Reduce number of inputs required for change
  • Build templates/automate repeatable tasks like network expansion or service addition
  • Keep modularity/flexibility in mind, but don’t go crazy. Again, reduce options!

4. The Right Way to Sell Automation

A great way to sell automation internally is to automate a remedial, time-consuming task. I always turn to configuration generation. You can start by looking at Jinja2, yaml, and python - here’s an example. Showing leadership that configs can be generated at scale, quickly, and without error may be the demo you need to gain buy-in. Reducing a day-long process down to just 30 minutes or less is huge! At this point your leadership may allocate time/resources to the automation efforts and your ops team should be supportive. From here, look at a more robust framework like ansible/nornir. Tie it into a DCIM. Integrate with your ticketing system. The world is your oyster!

5. Test Your Documentation

Do you think your documentation is effective? Have someone totally unfamiliar with your documented process run through it. If they make it to the end having successfully done the thing, great! If not, your documentation needs work. Docs should be easy to follow and have adequate information required to complete the task.

6. Maintain a Lab

Don’t test in production. To an extent practical, keep a hardware lab where you can test and validate changes on your network without disrupting service.

7. Periodically Evaluate Other Solutions

It can be good to periodically take a step back from your tools and solutions to examine their effectiveness. Perhaps a tool that was once great is now a liability. Don’t be prideful in tool and solution selection. Continuously evolve your environment for the better.

8. Tool Sprawl Sucks

There aren’t many things more frustrating that 5 different tools with function overlap and inaccurate device inventories. Identify the functions you need, pick which tool does it the best/good enough, and delete all other tools.