Cisco Nexus 5000 switches (5010 and 5020) are not really suitable for data center distribution layer from my point of view due to the limited number of VLANs supported (something like 512). They have enough switching power and maybe even port density but the VLAN limit is just too restricting. They also support 1G SFP only in the first few ports for migration in legacy environment.
Now, with the Nexus 5500 models (5548 and 5596) the situation is different: all 4094 VLANs are supported at the same time, and the port density is greater. And, it supports 1G SFP on each port. These all make it an alternative for distribution layer with some considerations. I will here ignore Nexus 5010/5020 altogether and just talk about Nexus 5500 models.
Note that I will not talk here anything about fabric extenders (FEXes)! Using FEXes you don’t face these problems in general, these are only related to connecting legacy access switches to Nexus 5500 switches.
Connecting access switches to Nexus 5500
As said, Nexus 5500 supports 1G SFP on each port. This means that you can connect normal access switches (like Catalyst 3560 and others) with 1G uplinks to Nexus 5500 during the migration. Of course, you will mostly connect your fabric extenders and 10G servers to these new Nexus switches anyway, the access switch connections are just for the legacy hardware that will be removed at some point. Old 802.1D STP is not supported on NX-OS but who uses it in data centers anyway. 802.1w (RSTP) and 802.1s (MSTP) are supported. If you have a pair of Nexus 5500 in vPC configuration you can dual-home your access switches to the Nexus 5500 pair with a port channel (preferably using LACP) and thus have your access switches forwarding with both uplinks (instead of blocking the other link with RSTP/MSTP). This should work fine, until you decide to upgrade your NX-OS on Nexus 5500. More about that below.
Using LACP fast hello
When connecting switches with LACP you may be tempted to use fast hello in LACP to be able to detect link problems faster. Normal LACP hello interval is 30 seconds which means that the LACP neighbor is detected dead after 90 seconds if nothing is heard. Fast hello means a hello interval of 1 second (dead peer detected in 3 seconds). This should work fine as well, until you are upgrading your NX-OS on Nexus 5500…
Using Bridge Assurance
Bridge Assurance (BA) is a feature you can use to tell the switches about links to other switches. The switches then assume that there should be BPDUs received from all those ports from the adjacent switches. If the BPDUs stop then the switches move those links to blocking because there seems to be some problem in the physical link. It sounds nice because it prevents traffic blackholing at least in some unidirectional traffic situations. Upgrading your NX-OS? See below.
So what’s the point
Nexus 5500 supports ISSU (In-Service Software Upgrade) for NX-OS upgrades. As there is only one supervisor component in Nexus 5500 it needs some special method to be able to upgrade the software without traffic interruptions. The special method is that the forwarding plane continues to work (so the traffic continues flowing) meanwhile the control plane (supervisor board) reboots to load the new software. The reboot takes something like 80 seconds to complete. While the control plane reboots the switch is unable to send or receive anything control-plane related frames. And while this is the case, it may or may not have some effect on the neighboring devices.
If the conditions are not right then NX-OS refuses to do ISSU and reverts back to old-style “just reboot all” method which obviously works but causes some traffic interrupts.
About the cases I mentioned above, they all create some exceptions for the ISSU.
Connecting access switches: As the Nexus switch is unable to send BPDUs normally during the ISSU it is directly specified that if you have any STP designated ports on the switch (other than the vPC peer-link) then ISSU is not supported. And, that would be the case if you connect access switches on your Nexus 5500 and your STP root is somewhere in your Nexus 5500s or upstream from that. Access switches rarely are STP roots, aren’t they?
Using LACP fast hellos: Since an LACPDU is expected by the other party in 3 seconds at most, using LACP fast hellos is not compatible with ISSU.
Using BA: Again, since the BPDUs are regularly expected by the other party, using BA is not compatible with ISSU. BA can be used in the vPC peer-link however because the vPC switches communicate with each other about the ISSU.
What can you do?
I will be connecting access switches to Nexus 5500s in vPC for sure, and I would like to use ISSU as well, so I have some ideas.
For connecting the access switches with LACP, the rule for using LACP fast hellos is quite clear: don’t. Just use the normal timers and ISSU will be possible because the Nexus 5500 is back online in 80 seconds (or so) before the LACP keepalive timers expire on the neighboring devices.
About using BA, it is also better not to use it for access switch connections at all. (Not all access switches even support it I guess.)
About the STP designated port limitation: You could always disable STP altogether (because there is only a single logical link (port channel) between the switches) but I really don’t recommend that. RSTP/MSTP in the background provides a safety measure for some incorrect physical connections, so let’s keep that running. [Updated 09-Jan-2012: Now this is totally incorrect information starting here] Instead, before starting the ISSU, just go and shutdown the ports on Nexus 5500 that are connecting to the downstream access switches. This assumes that all the access switches are properly connected with port channels to two Nexus 5500 at the same time! During the uplink port shutdown the access switches will continue to use the other member of the port channel uplink and no harm is done. [end-of-bullshit] That is not the case. Even if you shutdown the other uplink port the Nexus pair still knows that there is an access switch downstream breathing through the other vPC peer switch and refuses to execute ISSU.
I don’t know any workarounds for this. If you have even just one switch connected downstream to the Nexus 5500 you cannot execute ISSU. Nicely done Cisco, you really said Nexus 5500 is for access layer didn’t you, yes… Note that if you have everything dual-homed to the Nexus 5500 vPC pair you can just do a normal system update that relies on the port channel convergence instead of ISSU.
Bottom line
The bottom line is that even though Nexus 5500 is just a “small” switch with no redundant control plane it is possible to implement a network that stays usable even during the software upgrades.
I hope this post motivated you to think about your Nexus environment before implementing it and thus maybe prevented you from having some unwanted surprise later. Surprises are great on birthdays and so, but in an operating network they are not usually needed. Feel free to comment these thoughts either below or by contacting me directly.
Before deploying anything it is a good idea to read the documentation:
Nexus 5000 Series Install and Upgrade Guides: http://www.cisco.com/en/US/products/ps9670/prod_installation_guides_list.html (they list all the limitations for ISSU)
Nexus 5000 Series Release Notes: http://www.cisco.com/en/US/products/ps9670/prod_release_notes_list.html
Nexus 5000 Series Design Guides: http://www.cisco.com/en/US/products/ps9670/products_implementation_design_guides_list.html
My suggestion is to use the Juniper Ex series virtual chassis switches. It can do pretty much everything nexus does exception of FCoE.
Unlike Cisco Venus does not have to be at a certain part of the sky before an update. Also it has the full junos feature set.