One month ago, we were migrating our DC core from older Cisco Catalyst models to Nexus models in 2 different DCs, connected over WDM, a longtime plan to provide a better foundation for networking services in our environment (+OTV, MacSec, etc). One of the plan’s main ideas was to separate user and DC networks and services, something which was not done from the beginning (when the DC was created) and did not seem so simple to accomplish at start. The first step was creating the new core, seamlessly integrating it into the old one and then kicking the old core out. This was the first part of the migration, the second being the migration of physical end hosts that were connected to the old access switches with copper links.
While troubleshooting an application problem after the end of the application tests, we started a span session and then this happened:
https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvi74050/?reffering_site=dumpcr
We were blind for a couple of hours, loosing the DCI links because of the bug and it took us two days to get back services (whole weekend, managed to get to the airport to collect my wife 20 mins after we were done). In the whole it took us 15 days to sort it out, do a TAC recommended upgrade on the NX switches and get on with the project.
During the past week we were doing the live migration of the remaining end systems. The old environment was 15 years old (the first DC core was even older) and things were deployed incrementally, so not fully documented on databases or tools or described on the switch ports themselves. In order to prepare for that, a discovery process was needed, data collection and automatic transfer of configuration. I came up with a few scripts with python, netmiko, ntc-templates and Excel to help myself with this migration:
- Record the location of systems and create output on Excel and txt file
- Create a master config excel file where we would be able to choose the final location for each system and then
- Automatically produce the largest part of the configuration commands on text file, using to cut-n-paste commands on the cli (to provide for better control of the workflow)
As it turns out, fellow engineers with me on the project are difficult to trust scripts from Hell that they don’t understand, so in the end we did most of the configuration work by hand on a slow pace, only falling back to the script products for checking when something couldn’t be found or went wrong. I am still content with myself that I keep on learning, and I know that there is a large area of expertise yet to conquer, whether that is Nornir and other automation frameworks, or gluing monitoring tools together to provide automatic provisioning for network performance monitoring or new ways to graph and report data.
So below are the scripts, sanitized to protect sensitive data.
- First script connects to the core switch that holds the arp table for the hosts and gets it (“show ip arp” command using ntc-templates for getting structured data in a list), reads the switch list where the hosts are connected from an excel file, connects to the switches and checks which interfaces are in “connected” state (“show interfaces status” command using ntc-templates to get structured data) and are not uplink ports, checks for each interface which mac addresses are known through that port (show mac address-table command using ntc-templates to get structured data ) and looks up that mac in the arp data to find the corresponding ip address. Then the data are saved both in excel and text file (for easier visibility). If no mac address is found to be known through that interface, then the port vlan and mode are written in output (just access vlan or “trunk” value). The script creates the excel file, starts by creating the columns and column titles and then appends rows to it. In parallel it writes the same output to txt file. If the migration is done correctly, the only rows in the results should be in non DC vlans, in other words, no servers meant to be in the DC network should have remained on the access switches. That is easily determined by filtering out the user access vlans on the excel file. It could also be done inside the code of course (just another condition). Another thing to note is that we need to know the platform for each switch before hand in case the switch doesn’t support ssh at all. If it did, Kirk has put out some code on how to automatically detect the cisco platform upon connection.
from netmiko import ConnectHandler, ssh_exception
from paramiko.ssh_exception import SSHException
import os
import subprocess
import sys
import openpyxl
from datetime import datetime
from getpass import getpass
dc1core1 = {
'device_type': "cisco_nxos",
'ip': "x.y.z.w",
'username': "xxxxxx",
'password': "yyyyyy",
}
try:
net_connect = ConnectHandler(**dc1core1)
except SSHException:
print ("can't connect to last device")
sys.exit(1)
except ssh_exception.NetMikoTimeoutException:
print(" SSH Timed out")
sys.exit(1)
except ssh_exception.NetMikoAuthenticationException:
print("Invalid Credentials: ", "x.y.z.w")
sys.exit(1)
output = net_connect.find_prompt()
print (output)
arplist = net_connect.send_command("show ip arp", use_textfsm=True)
net_connect.disconnect()
wbswitches = openpyxl.load_workbook("switchesdc1.xlsx")
ws = wbswitches.active
max_row = ws.max_row
switches = []
for row in range (1, max_row+1):
switch = dict()
switch ['address'] = ws.cell(row= row, column = 2).value.strip()
switch ['name'] = ws.cell(row= row, column = 1).value
switches.append(switch)
wbswitches.close()
user = input('username:')
passwd = getpass()
file1 = open("dc1spresults.txt", "w+")
file1.close()
wbwr = openpyxl.Workbook()
ws = wbwr.active
ws.title = "DC1 System Results"
ws["A1"] = "System Name"
ws["B1"] = "System Port Desc"
ws["C1"] = "System IP"
ws["D1"] = "System Mac"
ws["E1"] = "System Vlan"
ws["F1"] = "SwitchName"
ws["G1"] = "SwitchAddress"
ws["H1"] = "PortName"
ws["I1"] = "PortVlan"
x = 1
for switch in switches:
switchname = switch ['name']
switchaddress = switch ['address']
print(switchname, switchaddress)
if ("2950" not in switchname) and ("3550" not in switchname):
#print(switchname)
contype = "cisco_ios_ssh"
cmdstringmac = "sh mac address-table"
else:
contype = "cisco_ios_telnet"
cmdstringmac = "sh mac-address-table"
swcon = {
'device_type': contype,
'ip': switchaddress,
'username': user,
'password': passwd,
}
try:
net_connect = ConnectHandler(**swcon)
except SSHException:
print ("can't connect to last device")
sys.exit(1)
except ssh_exception.NetMikoTimeoutException:
print(" SSH Timed out")
sys.exit(1)
except ssh_exception.NetMikoAuthenticationException:
print("Invalid Credentials: ", switchaddress)
sys.exit(1)
file1 = open("dc1spresults.txt", "a+")
file1.write(switchname + " " + switchaddress + "\n")
output = net_connect.find_prompt()
print (output)
file1.write(output + "\n")
file1.close()
maclist = net_connect.send_command(cmdstringmac, use_textfsm=True)
intstatlist = net_connect.send_command("show interfaces status", use_textfsm=True)
net_connect.disconnect()
file1 = open("dc1presults.txt", "a+")
for item in intstatlist:
if ("connected" in item['status']) and ("6509" not in item['name']) and ("N7K" not in item['name']):
x = x + 1
nomac = True
for macitem in maclist:
if macitem['destination_port'] == item['port']:
for arpitem in arplist:
if macitem['destination_address'] == arpitem['mac']:
nomac = False
print(switchname, switchaddress, item['port'], item['name'], arpitem['address'] , macitem['vlan'], macitem['destination_address'])
file1.write(switchname + " " + switchaddress + " " + item['port'] + " " + item['name'] + " " + arpitem['address'] + " " + macitem['vlan'] +" " + macitem['destination_address'] + "\n")
ws.cell(row = x, column = 2, value=item['name'])
ws.cell(row = x, column = 3, value=arpitem['address'])
ws.cell(row = x, column = 4, value=macitem['destination_address'])
ws.cell(row = x, column = 5, value=macitem['vlan'])
ws.cell(row = x, column = 6, value=switchname)
ws.cell(row = x, column = 7, value=switchaddress)
ws.cell(row = x, column = 8, value=item['port'])
ws.cell(row = x, column = 9, value=item['vlan'])
if nomac == True:
print(switchname, switchaddress, item['port'], item['name'])
file1.write(switchname + " " + switchaddress + " " + item['port'] + " " + item['name'] + "\n")
ws.cell(row = x, column = 2, value=item['name'])
ws.cell(row = x, column = 3, value="None")
ws.cell(row = x, column = 4, value="None")
ws.cell(row = x, column = 5, value="None")
ws.cell(row = x, column = 6, value=switchname)
ws.cell(row = x, column = 7, value=switchaddress)
ws.cell(row = x, column = 8, value=item['port'])
ws.cell(row = x, column = 9, value=item['vlan'])
file1.close()
wbwr.save('dc1spresults.xlsx')
print ("Done")
2. The Second script does a similar scan but doesn’t get the arp data and doesn’t care about the mac addresses. It just records the ports in connected state in the switches (skipping the uplink ports), the trunk state or the access vlan and descriptions and then creates the master config excel file which we can use to record where we want each port to migrate on the new environment before we run the 3rd script to create the command list.
from netmiko import ConnectHandler, ssh_exception
from paramiko.ssh_exception import SSHException
import os
import subprocess
import sys
import openpyxl
from datetime import datetime
from getpass import getpass
wbswitches = openpyxl.load_workbook("switchesdc1.xlsx")
ws = wbswitches.active
max_row = ws.max_row
switches = []
for row in range (1, max_row+1):
switch = dict()
switch ['address'] = ws.cell(row= row, column = 2).value.strip()
switch ['name'] = ws.cell(row= row, column = 1).value
switches.append(switch)
wbswitches.close()
user = input('username:')
passwd = getpass()
wbwr = openpyxl.Workbook()
ws = wbwr.active
ws.title = "DC1 Master Config"
ws["A1"] = "System Name"
ws["B1"] = "System Port Desc"
ws["C1"] = "SwitchName"
ws["D1"] = "SwitchAddress"
ws["E1"] = "PortName"
ws["F1"] = "Vlan-Trunk"
ws["G1"] = "Migrate"
ws["H1"] = "Fex"
ws["I1"] = "PortNumber"
ws["J1"] = "ChanGroup"
ws["K1"] = "Comments"
x = 1
for switch in switches:
switchname = switch ['name']
switchaddress = switch ['address']
print(switchname, switchaddress)
if ("2950" not in switchname) and ("3550" not in switchname):
contype = "cisco_ios_ssh"
cmdstringmac = "sh mac address-table"
else:
contype = "cisco_ios_telnet"
cmdstringmac = "sh mac-address-table"
swcon = {
'device_type': contype,
'ip': switchaddress,
'username': user,
'password': passwd,
}
try:
net_connect = ConnectHandler(**swcon)
except SSHException:
print ("can't connect to last device")
sys.exit(1)
except ssh_exception.NetMikoTimeoutException:
print(" SSH Timed out")
sys.exit(1)
except ssh_exception.NetMikoAuthenticationException:
print("Invalid Credentials: ", switchaddress)
sys.exit(1)
intstatlist = net_connect.send_command("show interfaces status", use_textfsm=True)
net_connect.disconnect()
for item in intstatlist:
if ("connected" in item['status']) and ("6509" not in item['name']) and ("N7K" not in item['name']):
x = x + 1
print(switchname, switchaddress, item['port'], item['name'])
ws.cell(row = x, column = 2, value=item['name'])
ws.cell(row = x, column = 3, value=switchname)
ws.cell(row = x, column = 4, value=switchaddress)
ws.cell(row = x, column = 5, value=item['port'])
ws.cell(row = x, column = 6, value=item['vlan'])
ws.cell(row = x, column = 7, value="yes")
ws.cell(row = x, column = 8, value="None")
ws.cell(row = x, column = 9, value="None")
ws.cell(row = x, column = 10, value="None")
wbwr.save('dc1masterconfig.xlsx')
print ("Done")
3. The third script doesn’t connect anywhere. It just reads the master config excel file and creates the output for the commands. One thing to take note is that in order for the script to run correctly and the ‘\’ character to appear in the config commands, you need to provide escape codes like “\\”.
import os
import subprocess
import sys
import openpyxl
from datetime import datetime
wbwr = openpyxl.load_workbook("dc1masterconfig.xlsx")
ws = wbwr.active
max_row = ws.max_row
portlist=[]
for row in range (2, max_row+1):
commandlist=[]
portdescr = ws.cell(row = row, column=2).value
portmigrate = ws.cell(row = row, column=7).value
fex = ws.cell(row = row, column=8).value
portnumber = ws.cell(row = row, column=9).value
portvlan = ws.cell(row = row, column=6).value
portchgroup = ws.cell(row = row, column=10).value
if (portmigrate == "no") or (fex == "None"):
continue
commandlist.append("\ninterface eth"+str(fex)+r'\1'+'\\'+str(portnumber)+"\n")
commandlist.append("des "+ portdescr +"\n")
commandlist.append("switchport\n")
if portvlan == "trunk":
porttype = "trunk"
commandlist.append("switchport mode trunk\n")
commandlist.append("spanning-tree port type edge trunk\n")
else:
porttype = "access"
commandlist.append("switchport mode access\n")
commandlist.append("switchport access vlan "+ str(portvlan)+"\n")
commandlist.append("spanning-tree port type edge\n")
if portchgroup == "None":
commandlist.append("vpc orphan-port suspend\n")
commandlist.append("no shutdown\n")
else:
commandlist.append("channel-group "+str(portchgroup)+"\n")
commandlist.append("no shutdown\n")
commandlist.append("interface Po"+str(portchgroup)+"\n")
commandlist.append("des ** to be completed\n")
portlist.append(commandlist)
wbwr.close()
file1 = open("dc1script.txt", "w+")
for item in portlist:
for line in item:
file1.write(line)
file1.close()
print (portlist)
print ("Done")
So what is this all about with netmiko and ntc-templates in this case for the above scripts? Well, we use netmiko to pilot our ssh sessions to the cisco equipment in an intelligent and abstruct manner, instead of doing pattern matching and a bunch of other things on our own. Kirk Byers has created an excellent platform with Netmiko that allows us to automate jobs on the network, either simple ones or complex ones, depending on the complexity of the additional code we use. We use NTC-Templates to parse the command results and create automatically lists of dictionaries containing the information we need as structured data, easily manipulated using loops and checks based on the fields provided by each template. Again without those, we would need to run regex filters and match statements ourselves (see some of my earlier posts) to create and store the data we need. Now with NTC-Templates, much less work is needed, and the code is so much more readable and re-usable. The templates themselves are easily downloaded and installed through git. You can open one up and see for yourself how easy it is to understand what is available. By navigating to the ntc-templates/templates dir we can see what commands are supported for parsing. If we open the show interfaces status template we get the content below:
Value PORT (\S+)
Value NAME (.+?)
Value STATUS (err-disabled|disabled|connected|notconnect|inactive)
Value VLAN (\S+)
Value DUPLEX (\S+)
Value SPEED (\S+)
Value TYPE (.*)
Start
^Port -> Begin
Begin
^(?=\s{0,9}${PORT}).{9}\s{20}${STATUS}\s+${VLAN}\s+${DUPLEX}\s+${SPEED}\s*${TYPE}$$ -> Record
^(?=\s{0,9}${PORT}).{9}\s${NAME}\s+${STATUS}\s+${VLAN}\s+${DUPLEX}\s+${SPEED}\s*${TYPE}$$ -> Record
So it’s very easy to understand that a group of records are created and the fields for each record are the list of values after the word Value :
- PORT
- NAME
- STATUS
- VLAN
- DUPLEX
- SPEED
- TYPE
By using the “intstatlist = net_connect.send_command(“show interfaces status”, use_textfsm=True)” statement, we are creating a list of dictionaries with the above indexes for each list item, storing the list in the intstatlist variable. Later we iterate through that list in our for statement (“for item in intstatlist:“) and access the dictionary objects by index using references like “item[‘status’]” or “item[‘vlan’]” . So easy to depict our own processing logic in our code to search through the values, match what we want and produce results.
It’s possible of course to do more things in the same scripts, like issue the config commands directly on the core switches. However the configuration master file would always break our fully automated process in two, because a human must decide what he/she wants to put where. Depending on each case, one might want to go further or not with automation. The important thing is to try and learn and be able to use these new tools either to make your life easier or standardize processes.
For us, the benefit was that we could run the first script as many times as we wanted during the migration process, to check our progress and choose our next target, and then finally check that nothing was left. If you take this further, let me know on twitter: @mythryll
Best of luck! Links follow: