Why do we need STP (Spanning Tree Protocol)?
A good network design typically includes a number of redundant links to ensure that should any issues arise during normal operations, the network (and it's users) are protected from any impact. Redundancy is the method of removing single point of failures from infrastructure. Therefore, for example, having a single link and a single switch between the edge router and the LAN is a single point of failure. To mitigate this, we add more switches and duplicate the links to increase the reliability of the network.
This however does bring challenges which we will discuss below.
Challenge 1 - Broadcast Storms
We know from earlier articles that broadcast traffic is forwarded out of all ports (expect those on different VLANs or the originating port) on a switch. Layer 2 frames also don't have a TTL (Time to Live) function like layer 3 packets. Broadcast traffic is generated all the time, as it's used for everything from ARP to DHCP etc, so there is a lot of broadcast traffic on a typical LAN. It's one of the key reasons to deploy VLANs on a large network. Let's look at the below diagram.
There are 3 switches connected together to build the LAN. All link lights are green indicating the ports are up and forwarding traffic. Now, PC3 wants to send some traffic to PC1, but before it can do that it needs to learn the MAC address of the device, so it forwards a broadcast ARP into the network with MAC of ffff.ffff.ffff.
The frame will arrive at SW3 and be forwarded to SW1 and SW2. Then, SW1 will forward the frame to SW2, and vice versa. SW1 will forward the frame back to SW3 which will then forward to SW2, then to SW1, then to SW3, then to SW2, then to SW1. This also happens in reverse. Can you see what has happened? Look at the below diagram.
A layer 2 loop is present in the network, so broadcast frames will continue to forward infinitely until a switch is unplugged or it fails. On normal networks with multiple users, frames can very quickly build up in the network and cause total failure as it becomes full of broadcast packets. This is called a broadcast storm.
Challenge 2 - MAC Address Flapping
When a layer 2 loop as we've seen above is formed, it causes the switches to continuously update their MAC address tables which are used to track which frames should be send out of which ports. This instability of MAC address tables is called MAC address flapping. As the frame is looped around the network, switches will install the source MAC from the arriving frame against that port. If a source MAC which is already in the table is seen arriving on a different interface, the switch updates it's MAC address table to reflect this change. This causes frames to be forwarded back out of the wrong ports, further increasing congestion and worsening the broadcast storm.
Solving the Layer 2 Loops with STP (PVST+)
STP (Spanning Tree Protocol) prevents layer 2 loops by placing one or more ports into a blocking state. By blocking traffic on a port which would introduce a loop, STP mitigates broadcast storms and MAC address flapping which we learnt about earlier. Note that root bridge and root switch are terms used interchangeably. Please note that Cisco has it's own implementation of STP, calling PVST+ (Per-VLAN Spanning Tree). PVST+ allows for an STP topology for each VLAN allowing for load-balancing across switches. The IEEE STP (802.1D) version does not.
STP uses the following three processes to prevent loops:
-
Elect a root switch
-
Identify the root ports on non-root switches
-
Identify the designated ports (and if there is a switch on the other side of that link, the blocking ports will be implemented)
Electing the root switch
The root switch is the start of the spanning tree topology. All ports are placed into a designated state, as traffic is being forwarded away from the root switch. To do this, switches send a specific frame into the network called BPDUs. The BPDU (Bridge Protocol Data Unit) contains the unique bridge ID, which comprises of the following sections:
These BPDUs are forwarded into the network every 2 seconds (BPDU Hello), and switches use this bridge ID to negotiate the root switch. The bridge ID with the lowest value becomes the root. The BID (Bridge ID) includes the bridge priority and extended system ID (which is just the VLAN ID). The default BID will be 32769 (32768 + 1).
When switches come online, they advertise themselves as the root switch, using the BID as seen above. The election process follows the below steps:
-
Lowest Bridge Priority
-
Lowest MAC Address
So, if three switches come online, all with the same Bridge Priority, the BID with the lowest MAC address will become the root switch. In the below diagram, which switch will become the root?
All of the switches have the same priority, so the next metric that STP will use to elect the root switch will be the lowest MAC address. SW1 has a MAC address of 00ab.53ff.5362, SW2 has a MAC address of 00ab.99ab.4436 and SW3 has a MAC address of 00ab.1004.1101. MAC addresses are in hexadecimal, so the value for f is 15 as an example. So the switch with the lowest MAC address in this topology is SW3, followed by SW2 then SW1. SW3 becomes the root switch and all ports on SW3 are placed into a designated state.
All switches agree by including the root ID and their own BID within the BPDU. So each BPDU will look like this from SW1 as an example:
Bridge ID:
Priority: 32769
MAC Address: 00ab.53ff.5362
Root ID:
Priority: 32769
MAC Address: 00ab.1004.1101
Identify the root ports on non-root switches
Now that the root switch has been identified, the next stage in the STP process is to identify the best path to the root. These are known as root ports and every switch (expect for the root switch) will have one root port leading back to the root switch. There are four metrics used during this specific process, as below in order of desirability:
-
Lowest path cost
-
Lowest neighbour BID
-
Lowest port priority
-
Lowest port ID (of sending port)
To continue with this process we need to know the path costs used by STP to calculate the root ports, see the below table for these costs.
Let's see how the path cost system works below:
SW3 is the root switch as per the previous root switch election process. All ports on SW3 are placed into a designated (forwarding) state. BPDUs are sent out by the root switch with a path cost of 0 (as they are originating from the root). The BPDU arrives at SW1's Gi0/0 port, and because it is a Gigabit link, a cost of 4 is added to the BPDU. Then, the BPDU is forwarded out of Gi0/1 which adds another cost of 4, making the current root path cost equal to 8. Therefore, port Gi0/0 on SW1 becomes the root port. The same happens for SW2, I'll try to simplify below:
BPDU with cost 0 sent from root to Gi0/1 on SW2 > BDPU arrives at Gi0/1 on SW2, 0 + 4 = 4 > BPDU continues out of Gi0/0 on SW2, 4 + 4 = 8 > Gi0/1 becomes the root port.
However, what happens if there are duplicate links? Let's take a look at a more sophisticated example:
This diagram is a lot busier but shows three of the deciding factors used during the root port election process. SW1 is the root, so all ports are in a designated state. SW2 and SW3 have ports leading back to SW1 as their root ports, as the cost to reach the root is lower than going via SW4. Now, SW4 has to decide which of four ports will be it's root port. Remember the criteria, the costs are equal so the next tie-breaker is the switch with the lowest neighbour BID, which in this case is SW2.
SW5 now needs to choose it's root port, again both costs to the root are an equal 16. We also can't use the neighbour BID, as it's the same. So, the next tie-breaker is the port priority. Here SW4's Gi0/2 interface has a manually set priority, making it the winner in this case. Again STP uses the neighbour port priority, not it's own. Port priorities looks like this:
128.1
The value before the full stop is the priority, and the value after the full stop is the interface ID. For example, Gi0/1 would be 128.1, Gi0/4 would be 128.4 and so on. We can manually configure the priority value to manipulate STP.
Identify the designated ports (and blocked ports)
The final stage of the Spanning Tree Protocol process is to place ports into a designated state, which means that these are ports that forward away from the root switch. There can only be one designated port on a single segment (where two switches connect together), so one of the ports is placed into a blocking state. The decision to place ports into a blocking state has a few deciding tie-breakers similar to the root port election process. The election tie-breakers are as follows:
-
Switch with the lowest cost to the root (lowest path cost)
-
Switch with the lowest BID (Bridge ID)
-
(Rarely seen) Switch with the lowest port priority
-
(Rarely seen) Switch with the lowest port ID
Let's use the same diagram where we learnt about the root port election process:
Firstly at SW1, as it is the root switch all ports are placed into a designated state. On the segment between SW2 and SW4, because SW4's interface is set as the root port, Gi0/0 on SW2 becomes the designated port. Now, let's look at the segment between SW3 and SW4. Neither of the interfaces on this link are root ports, so now the election process for designated ports is underway. Because Gi0/1 on SW3 has the lowest path cost to the root switch, it becomes the designated port. On SW4, the interface on the other end of the segment (Gi0/0) is placed into a blocking state.
Finally, on the segment between SW4 and SW5, because Gi0/1 on SW5 is the root port, Gi0/2 on SW4 is placed into a designated state. Gi0/3 on SW4 has a lower path cost to the root so that port becomes designated, while Gi0/0 on SW5 is placed into a blocking state. See below designated and blocking ports.
Note that if we had end user devices plugged into these switches, those ports would be placed into a designated (forwarding) state as end user devices don't generate BPDUs.
Now that the root switch, root ports, designated ports and blocking ports have been identified, STP is now considered as converged, with any layer 2 loops avoided!
Port Roles, States and Timers
As you have already read, there are two port roles:
-
Root port (Forwarding in the direction of the root switch)
-
Designated port (Forwarding away from the root switch)
However, we haven't yet covered off the port states for original STP, these are:
The next key bit to understand is the timers. In the original STP it can take a long time for interfaces to enter a forwarding state. Typically up to 50 seconds. This is called STP convergence and there are a few timed processes we need to know, these are:
-
Hello - This is the interval that BPDUs are sent by the switches (root switch in a converged network). The default interval is 2 seconds
-
Forward Delay - The is the delay for the transitory states of listening and learning. Each delay is 15 seconds so a total of 30 seconds for original STP is seen
-
MaxAge - This is the timeout for each switch to not receive a BPDU from the root, which is usually 10x the Hello timer, so 20 seconds
Example Output
When looking at the original STP (PVST+) output of a Cisco appliance, you'll notice that the port role states "Altn". This is used in Rapid-STP which we will review shortly. When original STP (PVST+) is in use, ignore this and deem it a designated port. Let's look at an example output below. (The topology for this output is two PCs linked to Fa0/1 & Fa0/2, with Gi0/1 & Gi0/2 linked to other switches.)
Switch3#sh spanning-tree
VLAN0001
Spanning tree enabled protocol ieee
Root ID Priority 32769
Address 0002.1687.8C70
Cost 4
Port 25(GigabitEthernet0/1)
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Bridge ID Priority 32769 (priority 32768 sys-id-ext 1)
Address 00D0.D3E6.DDBA
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Aging Time 20
Interface Role Sts Cost Prio.Nbr Type
---------------- ---- --- --------- -------- --------------------------------
Fa0/1 Desg FWD 19 128.1 P2p
Fa0/2 Desg FWD 19 128.2 P2p
Gi0/1 Root FWD 4 128.25 P2p
Gi0/2 Altn BLK 4 128.26 P2p
Note that only on Cisco devices the port role here is listed incorrectly. It should be listed as Designated.
Looking at the output, we can see that this switch isn't the root switch, and port Gi0/2 is in a blocking state. Gi0/1 is listed as a root port which Fa0/1 and Fa0/2 are in a designated forwarding state. You can see how the output lists the Root Bridge IDs, then it's own Bridge IDs which contains information such as timers, the bridge priority and burnt-in MAC address. Remember that the extended system ID is the VLAN number, which in this case is VLAN1.
Optional STP Features
There are some important features that we need to understand that can improve the stability of the network and further prevent loops, these are:
-
PortFast - this can be used against edge ports (where the device connected to the switch is for example a PC, a phone or a server etc. By configured a switchport as PortFast, this tells the switch to place the port directly into a forwarding state, bypassing the listening/learning states
-
BPDU Guard - this can be used against edge ports as well as PortFast to further secure the network. When BPDU Guard is enabled, BPDUs can still be transmitted however should a BPDU be received, the port is shut down until it is manually re-enabled. This avoids possible malicious attacks by users attempting to introduce a switch with a lower priority than the legitimate root in the topology. Typically, you would configure BDPU Guard if PortFast is enabled. You would not want to configure this on ports linking switches together.
The problem with original STP (PVST+)
STP (or PVST+) takes a relatively long time to converge following a network change. Which when it was implemented wasn't an issue, however in modern networks it is unacceptable and should be avoided where possible. The default values of the Hello, MaxAge and Forward Delay timers cause a convergence delay of up to 50 seconds. Once a link goes down, the MaxAge count down begins from the last time a BPDU frame was received. 20 seconds later the port is put back into a listening state for 15 seconds, then into a learning state for 15 seconds. Once these stages are completed the network will have reconverged however with how critical modern networks are to today's society, that is too long. So, how was the delay issue fixed? See below:
-
Removing some port states (Disabled and blocking combined into Discarding, and Listening state removed)
-
Adding some port roles (alternate and backup) which allows for immediate interface transition in the event of a link failure
-
All switches now send their own BPDUs, rather than just the root switch
-
MaxAge timer much shorter (typically 3x hello time)
The above changes are used in what's known as 802.1W RSTP (Rapid Spanning Tree). Again Cisco has it's own implementation called Rapid-PVST+. In fact, most network devices implemented today used RSTP (Rapid-PVST+) by default.
RSTP (Rapid Spanning Tree)
RSTP (Rapid PVST+) Concepts
RSTP is very similar to STP (so similar they can actually be used on the same network). The only real differences are the port states, roles and timers. The process for RSTP is the same as STP:
-
Elect a root switch
-
Assign root ports
-
Calculate designated and blocking ports
There is one key difference though and that is the implementation of the alternate port. This port is an alternate path to the root and will be in a blocking state while the main root port is up and active. There is also the addition of a backup port which is used when the switch is connected to an hub. Because hubs are rarely seen in modern networks it's unlikely you'll see a backup port in the RSTP topology.
Let's go through the stages again on how STP converges the topology, and then how the new RSTP alternate role fits into this process.
Looking at the above diagram, we can see that SW1 has been elected as the root switch, as it has the lowest BID (Bridge ID) within this network. All ports on the root switch are placed into a designated state. Second comes the root port calculation process, of which the following tie-breakers are used:
-
Lowest path cost
-
Lowest neighbour BID
-
Lowest port priority
-
Lowest port ID (of sending port)
Now, in STP the next step would be to place any remaining ports which are on a segment (two switches connected to each other) into a blocking state while calculating the designated ports. However, instead RSTP gives the blocked port with the next best path cost to the bridge the 'Alternate' role. The switch on the segment with the lowest BID or wins from the tie-breakers will place it's port into a designated forwarding state, while the losing switch will place it's port into an alternate, blocking state. This process uses the following tie-breakers:
-
Switch with the lowest cost to the root (lowest path cost)
-
Switch with the lowest BID (Bridge ID)
-
(Rarely seen) Switch with the lowest port priority
-
(Rarely seen) Switch with the lowest port ID
The final step is the backup port, which again is very rarely seen as hubs are no longer used (note that the diagram doesn't depict any hubs, so a backup port won't be seen). If a hub is in use, with the switch being plugged into on two ports, the switch will receive it's own BPDUs and will place the losing port into a backup, blocking state. The tie-breakers for the backup port are as below:
-
Switch with the lowest port priority
-
Switch with the lowest port ID
So, now that the RSTP topology is converged we have a network where if a segment goes offline, RSTP can immediately place an alternate port into a root port state, bringing it into a forwarding state. Reducing disruption to the network. Let's take a look at how the topology changes if a segment goes down:
Here we can see that the ports Gi0/1 on SW4 and Gi0/0 on SW2 have entered a down, down state, which originally was being used to actively forward traffic. Because RSTP is in use, SW4 had allocated Gi0/0 as it's alternate port. Once Gi0/1 went into a down state, the switch instantly changes Gi0/0 from alternate port to root port, allowing traffic to continue to be forwarded. SW4 will flush the learnt MACs via Gi0/1 from it's MAC Address Table to allow for frames to be sent via Gi0/0 (frame flooding).
RSTP (Rapid PVST+) Port Types
In RSTP there are 3 different port types, these are:
-
P2p - used for ports linked between switches
-
P2p Edge - used for ports that are connected to end-user devices (PCs, phones, servers etc)
-
Shared - used for Ethernet hubs, furthermore this port will be operating in Half-Duplex
As an example, look at the below section from a "sh spanning-tree" command:
Interface Role Sts Cost Prio.Nbr Type
------------------- ---- --- --------- -------- --------------------------------
Fa1/0/1 Desg FWD 19 128.3 P2p
Fa1/0/2 Desg FWD 19 128.4 P2p
Fa1/0/13 Desg FWD 19 128.15 P2p Edge
Fa1/0/14 Desg FWD 19 128.16 P2p Edge
Fa1/0/15 Desg FWD 19 128.17 P2p Edge
Ports Fa1/0/13-15 are connected to PCs, while Fa1/0/1-2 are connected to further switches.
RSTP (Rapid PVST+) Port Roles and States (with comparison to STP)
The below table outlines the port roles and states for RSTP and STP. Any differences are highlighted in red.
** Note that PVST+ (802.1D flavour) does include alternate and backup roles by name only.
Configuring PVST+/Rapid PVST+
There are a few commands that we need to be aware of for the CCNA, including how to manipulate STP/RSTP to make a switch of your choice the root switch. The following table outlines the key commands needed for the CCNA: