Routing Loop, Failure by Design

I have spent some time studying the CCDE materials. One broken design example that has come up involves route reflector clients that don’t align with the physical topology. This article examines that example and some solutions to the problem.

To illustrate this example we have built the topology below. I used loopback addresses 1.1.1.1 through 6.6.6.6 (based on csr1000v-x). The router on the top is a eBGP neighbor with csr1000v-1 and csr1000v-2. The four routers forming a square in the center have an initial configuration of OSFP and BGP (iBGP as shown). Both Route Reflectors are peered with both clients.

Route Reflector Initial Configuration

//csr1000v-2 shown, csr1000v-3 similar

router ospf 1
router-id 2.2.2.2
passive-interface GigabitEthernet2
network 2.2.2.2 0.0.0.0 area 0
network 10.0.0.0 0.255.255.255 area 0

router bgp 64513
bgp router-id 2.2.2.2
bgp log-neighbor-changes
neighbor 3.3.3.3 remote-as 64513
neighbor 3.3.3.3 update-source Loopback0
neighbor 4.4.4.4 remote-as 64513
neighbor 4.4.4.4 update-source Loopback0
neighbor 4.4.4.4 route-reflector-client
neighbor 5.5.5.5 remote-as 64513
neighbor 5.5.5.5 update-source Loopback0
neighbor 5.5.5.5 route-reflector-client
neighbor 10.1.2.1 remote-as 64512

RR Client Configuration

//csr1000v-4 shown, csr1000v-5 similar
router ospf 1
router-id 4.4.4.4
passive-interface GigabitEthernet2
network 4.4.4.4 0.0.0.0 area 0
network 10.0.0.0 0.255.255.255 area 0

router bgp 64513
bgp router-id 4.4.4.4
bgp log-neighbor-changes
neighbor 2.2.2.2 remote-as 64513
neighbor 2.2.2.2 update-source Loopback0
neighbor 3.3.3.3 remote-as 64513
neighbor 3.3.3.3 update-source Loopback0

Base Configuration Validation

//csr1000v-6
csr1000v-6#ping 172.16.1.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.1.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/3 ms
csr1000v-6#show run | inc route
ip route 0.0.0.0 0.0.0.0 10.5.6.5
ip route 0.0.0.0 0.0.0.0 10.4.6.4

//csr1000v-4
csr1000v-4#show ip route 172.16.1.1
Routing entry for 172.16.1.1/32
Known via "bgp 64513", distance 200, metric 0
Tag 64512, type internal
Last update from 10.1.2.1 00:09:08 ago
Routing Descriptor Blocks:
* 10.1.2.1, from 2.2.2.2, 00:09:08 ago
Route metric is 0, traffic share count is 1
AS Hops 1
Route tag 64512
MPLS label: none
csr1000v-4#show ip route 10.1.2.1
Routing entry for 10.1.2.0/24
Known via "ospf 1", distance 110, metric 2, type intra area
Last update from 10.2.4.2 on GigabitEthernet4, 03:37:18 ago
Routing Descriptor Blocks:
* 10.2.4.2, from 2.2.2.2, 03:37:18 ago, via GigabitEthernet4
Route metric is 2, traffic share count is 1

//csr1000v-5
csr1000v-5#show ip route 172.16.1.1
Routing entry for 172.16.1.1/32
Known via "bgp 64513", distance 200, metric 0
Tag 64512, type internal
Last update from 10.1.2.1 03:23:25 ago
Routing Descriptor Blocks:
* 10.1.2.1, from 2.2.2.2, 03:23:25 ago
Route metric is 0, traffic share count is 1
AS Hops 1
Route tag 64512
MPLS label: none

csr1000v-5#show ip route 10.1.2.1
Routing entry for 10.1.2.0/24
Known via "ospf 1", distance 110, metric 2, type intra area
Last update from 10.2.5.2 on GigabitEthernet5, 00:14:54 ago
Routing Descriptor Blocks:
* 10.2.5.2, from 2.2.2.2, 00:14:54 ago, via GigabitEthernet5
Route metric is 2, traffic share count is 1

Following through to the edge

//in this example, path selection takes 172.16.1.1 out the router on the left
csr1000v-2#show ip route 172.16.1.1
Routing entry for 172.16.1.1/32
Known via "bgp 64513", distance 20, metric 0
Tag 64512, type external
Last update from 10.1.2.1 03:52:25 ago
Routing Descriptor Blocks:
* 10.1.2.1, from 10.1.2.1, 03:52:25 ago
Route metric is 0, traffic share count is 1
AS Hops 1
Route tag 64512
MPLS label: none
csr1000v-2#show ip route 10.1.2.1
Routing entry for 10.1.2.0/24
Known via "connected", distance 0, metric 0 (connected, via interface)
Routing Descriptor Blocks:
* directly connected, via GigabitEthernet2
Route metric is 0, traffic share count is 1

Now we are going to break this design. I am going to disable the iBGP relationships between csr1000v-2 and csr1000v-4 as well as csr1000v-3 and csr1000v-5. I am also going to disable the other interfaces. This basically leaves us with a physical topology that is not well represented in our BGP route reflector design.

//csr1000v-2
csr1000v-2(config)#int gig 5
csr1000v-2(config-if)#shutdown

csr1000v-2(config-if)#router bgp 64513
csr1000v-2(config-router)#neighbor 4.4.4.4 shutdown

//csr1000v-3
csr1000v-3(config)#int gig 5
csr1000v-3(config-if)#shut

csr1000v-3(config-if)#router bgp 64513
csr1000v-3(config-router)#neighbor 5.5.5.5 shutdown

At this point, we have at least two physical paths and at least two logical paths through the network (form csr1000v-6 to csr1000v-1). However, 172.16.1.1 is NOT reachable from 172.16.1.1

csr1000v-6#ping 172.16.1.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.1.1, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)

Traceroute produces some interesting results

csr1000v-6#trace 172.16.1.1
Type escape sequence to abort.
Tracing the route to 172.16.1.1
VRF info: (vrf in name/id, vrf out name/id)
1 10.4.6.4 2 msec
10.5.6.5 2 msec
10.4.6.4 3 msec
2 10.4.5.4 2 msec
10.4.5.5 2 msec
10.4.5.4 1 msec
3 10.4.5.4 2 msec
10.4.5.5 2 msec
10.4.5.4 2 msec
<— snip for brevity—>

29 10.4.5.4 5 msec
10.4.5.5 16 msec
10.4.5.4 8 msec
30 10.4.5.4 7 msec
10.4.5.5 6 msec
10.4.5.4 4 msec
csr1000v-6#

To see what is going, I am going to look at csr1000v-4 and csr1000v-5. Notice that both have a path toward their respective route reflectors.

csr1000v-4#show ip route 172.16.1.1
Routing entry for 172.16.1.1/32
Known via "bgp 64513", distance 200, metric 0
Tag 64512, type internal
Last update from 10.1.3.1 00:05:58 ago
Routing Descriptor Blocks:
* 10.1.3.1, from 3.3.3.3, 00:05:58 ago
Route metric is 0, traffic share count is 1
AS Hops 1
Route tag 64512
MPLS label: none

---

csr1000v-5#show ip route 172.16.1.1
Routing entry for 172.16.1.1/32
Known via "bgp 64513", distance 200, metric 0
Tag 64512, type internal
Last update from 10.1.2.1 00:01:15 ago
Routing Descriptor Blocks:
* 10.1.2.1, from 2.2.2.2, 00:01:15 ago
Route metric is 0, traffic share count is 1
AS Hops 1
Route tag 64512
MPLS label: none

At an initial glance, it looks like this should work. But it is necessary to look deeper and recurse to the next hop IP address for each of these routes.

csr1000v-4#show ip route 172.16.1.1
Routing entry for 172.16.1.1/32
Known via "bgp 64513", distance 200, metric 0
Tag 64512, type internal
Last update from 10.1.3.1 00:33:46 ago
Routing Descriptor Blocks:
* 10.1.3.1, from 3.3.3.3, 00:33:46 ago
Route metric is 0, traffic share count is 1
AS Hops 1
Route tag 64512
MPLS label: none
csr1000v-4#show ip route 10.1.3.1
Routing entry for 10.1.3.0/24
Known via "ospf 1", distance 110, metric 3, type intra area
Last update from 10.2.4.2 on GigabitEthernet4, 00:15:13 ago
Routing Descriptor Blocks:
* 10.4.5.5, from 3.3.3.3, 00:33:10 ago, via GigabitEthernet3
Route metric is 3, traffic share count is 1
10.2.4.2, from 3.3.3.3, 00:15:13 ago, via GigabitEthernet4
Route metric is 3, traffic share count is 1

---

csr1000v-5#show ip route 172.16.1.1
Routing entry for 172.16.1.1/32
Known via "bgp 64513", distance 200, metric 0
Tag 64512, type internal
Last update from 10.1.2.1 00:33:30 ago
Routing Descriptor Blocks:
* 10.1.2.1, from 2.2.2.2, 00:33:30 ago
Route metric is 0, traffic share count is 1
AS Hops 1
Route tag 64512
MPLS label: none
csr1000v-5#show ip route 10.1.2.1
Routing entry for 10.1.2.0/24
Known via "ospf 1", distance 110, metric 3, type intra area
Last update from 10.3.5.3 on GigabitEthernet4, 00:15:57 ago
Routing Descriptor Blocks:
* 10.4.5.4, from 2.2.2.2, 00:35:34 ago, via GigabitEthernet3
Route metric is 3, traffic share count is 1
10.3.5.3, from 2.2.2.2, 00:15:57 ago, via GigabitEthernet4
Route metric is 3, traffic share count is 1

As can be seen from this output, these routers are load balancing the next hop from BGP over two links. Gigabit 4 is the northbound link out of each of the bottom two routers. Gigbit 3 is the crosslink between them. Based on this, there is a possibility that some traffic will just happen to flow north (as one might expect for traffic destined to 172.16.1.1).  However, what we have witnessed is traffic looping between these two routers.

I’m actually lucky that this failed the way that it did. Had the planets been in alignment, the traffic could have landed on Gig4 and been delivered. I will further break this to make a point. I’m going to shut down the cross-link between csr1000v-2 and csr1000v-3. This will prevent the two routers on the top from having a low-cost path to the other router’s connection to csr1000v-1.

csr1000v-3(config)#int gig 3
csr1000v-3(config-if)#shut

Now I will examine the route tables from the two lower routers again.

csr1000v-4#show ip route 172.16.1.1
Routing entry for 172.16.1.1/32
Known via "bgp 64513", distance 200, metric 0
Tag 64512, type internal
Last update from 10.1.3.1 00:56:44 ago
Routing Descriptor Blocks:
* 10.1.3.1, from 3.3.3.3, 00:56:44 ago
Route metric is 0, traffic share count is 1
AS Hops 1
Route tag 64512
MPLS label: none
csr1000v-4#show ip route 10.1.3.1
Routing entry for 10.1.3.0/24
Known via "ospf 1", distance 110, metric 3, type intra area
Last update from 10.4.5.5 on GigabitEthernet3, 00:38:11 ago
Routing Descriptor Blocks:
* 10.4.5.5, from 3.3.3.3, 00:56:08 ago, via GigabitEthernet3
Route metric is 3, traffic share count is 1

---

csr1000v-5# show ip route 172.16.1.1
Routing entry for 172.16.1.1/32
Known via "bgp 64513", distance 200, metric 0
Tag 64512, type internal
Last update from 10.1.2.1 00:56:24 ago
Routing Descriptor Blocks:
* 10.1.2.1, from 2.2.2.2, 00:56:24 ago
Route metric is 0, traffic share count is 1
AS Hops 1
Route tag 64512
MPLS label: none
csr1000v-5# show ip route 10.1.2.1
Routing entry for 10.1.2.0/24
Known via "ospf 1", distance 110, metric 3, type intra area
Last update from 10.4.5.4 on GigabitEthernet3, 00:38:49 ago
Routing Descriptor Blocks:
* 10.4.5.4, from 2.2.2.2, 00:58:26 ago, via GigabitEthernet3
Route metric is 3, traffic share count is 1

This is reliably broken and csr1000v-4 and csr1000v-5 have a single route toward each other for 172.16.1.1.

csr1000v-4 is 172.16.1.1->10.1.3.1 (via 10.4.5.5 on Gig3)
csr1000v-5 is 172.16.1.1->10.1.2.1 (via 10.4.5.4 on Gig3)

This creates a permanent routing loop between these two routers.

There is nothing wrong with BGP. There is nothing wrong with OSPF. The problem is the design. BGP carries a next hop address and the IGP is used to determine proper egress for that next hop. It is the lack of design in the interaction of the IGP and EGP that has caused our issue.

At this point, I think it is important to ask ourselves the following questions.

  1. Do we fully understand why csr1000v-4 uses Gig3 as an egress for 10.1.3.1?
  2. Do we fully understand why csr1000v-5 uses Gig3 as an egress for 10.1.2.1?
  3. Do we understand why BGP process in each of these routers establishes a different next hop address?
  4. How might we solve this?

The first of those three questions are fundamental to understanding the problem. The fourth question is what is most interesting to me. Without re-enabling any interfaces, let’s take a look at a couple of ways to make this work.

The first solution would be to align the Route Reflectors and their clients with the physical topology. I can demonstrate this by flipping the clients to the opposing route reflector.

csr1000v-2(config)#router bgp 64513
csr1000v-2(config-router)#no neighbor 4.4.4.4 shutdown
csr1000v-2(config-router)#neighbor 5.5.5.5 shutdown

---

csr1000v-3(config)#router bgp 64513
csr1000v-3(config-router)#no neighbor 5.5.5.5 shutdown
csr1000v-3(config-router)# neighbor 4.4.4.4 shutdown

Now lets take a look at csr1000v-4 and csr1000v-5

csr1000v-4#show ip route 172.16.1.1
Routing entry for 172.16.1.1/32
Known via "bgp 64513", distance 200, metric 0
Tag 64512, type internal
Last update from 10.1.2.1 00:02:42 ago
Routing Descriptor Blocks:
* 10.1.2.1, from 2.2.2.2, 00:02:42 ago
Route metric is 0, traffic share count is 1
AS Hops 1
Route tag 64512
MPLS label: none
csr1000v-4#show ip route 10.1.2.1
Routing entry for 10.1.2.0/24
Known via "ospf 1", distance 110, metric 2, type intra area
Last update from 10.2.4.2 on GigabitEthernet4, 05:07:33 ago
Routing Descriptor Blocks:
* 10.2.4.2, from 2.2.2.2, 05:07:33 ago, via GigabitEthernet4
Route metric is 2, traffic share count is 1

---

csr1000v-5#show ip route 172.16.1.1
Routing entry for 172.16.1.1/32
Known via "bgp 64513", distance 200, metric 0
Tag 64512, type internal
Last update from 10.1.3.1 00:01:13 ago
Routing Descriptor Blocks:
* 10.1.3.1, from 3.3.3.3, 00:01:13 ago
Route metric is 0, traffic share count is 1
AS Hops 1
Route tag 64512
MPLS label: none
csr1000v-5#show ip route 10.1.3.1
Routing entry for 10.1.3.0/24
Known via "ospf 1", distance 110, metric 2, type intra area
Last update from 10.3.5.3 on GigabitEthernet4, 05:07:43 ago
Routing Descriptor Blocks:
* 10.3.5.3, from 3.3.3.3, 05:07:43 ago, via GigabitEthernet4
Route metric is 2, traffic share count is 1

As can be seen from the output, traffic consistently routes toward the top routers.

csr1000v-4 is 172.16.1.1->10.1.1.1 (via 10.2.4.2 on Gig4)
csr1000v-5 is 172.16.1.1->10.1.3.1 (via 10.3.5.3 on Gig4)

This is confirmed to be functional by sending ICMP echo requests from csr1000v-6.

csr1000v-6#ping 172.16.1.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.1.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 2/4/9 ms

At this point, it is clear that aligning the physical and logical topologies will solve the immediate problem. Next, we will try solving this problem using MPLS. To do so, let’s first make the topology broken again.

//undo of RR/client topology alignment
csr1000v-2(config)#router bgp 64513
csr1000v-2(config-router)#no neighbor 5.5.5.5 shutdown
csr1000v-2(config-router)# neighbor 4.4.4.4 shutdown

---

csr1000v-3(config)#router bgp 64513
csr1000v-3(config-router)#no neighbor 4.4.4.4 shutdown
csr1000v-3(config-router)#neighbor 5.5.5.5 shutdown

At this point, the topology is consistently broken once again.

The theory around MPLS is to force csr1000v-4 or csr1000v-5 to do label switching (as opposed to IP based forwarding) for traffic received on the Gig3 link. This could prevent the forwarding loop that we saw based on the integration of BGP and the recursive lookup.

//csr1000v-2,csr1000v-3,csr1000v-4,csr1000v-5
interface gig 2
mpls ip
interface gig 3
mpls ip
interface gig 4
mpls ip

At this point, it does look like it works.

csr1000v-6#ping 172.16.1.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.1.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 2/2/4 ms

The question we need to answer is why. Before we get into the hop by hop analysis, I want to also get a traceroute.

csr1000v-6#trace 172.16.1.1
Type escape sequence to abort.
Tracing the route to 172.16.1.1
VRF info: (vrf in name/id, vrf out name/id)
1 10.4.6.4 2 msec
10.5.6.5 2 msec
10.4.6.4 2 msec
2 10.4.5.4 [MPLS: Label 19 Exp 0] 4 msec
10.4.5.5 [MPLS: Label 20 Exp 0] 3 msec
10.4.5.4 [MPLS: Label 19 Exp 0] 3 msec
3 10.3.5.3 3 msec
10.2.4.2 2 msec
10.3.5.3 2 msec
4 10.1.2.1 6 msec
10.1.3.1 4 msec *

We can see that there is some path diversity being used in the above output. I will also take a look at the two routers in the lower part of the square.

From a next hop standpoint, we can see that they still point to one another (FIB).

csr1000v-4#show ip route 172.16.1.1
Routing entry for 172.16.1.1/32
Known via "bgp 64513", distance 200, metric 0
Tag 64512, type internal
Last update from 10.1.3.1 00:56:44 ago
Routing Descriptor Blocks:
* 10.1.3.1, from 3.3.3.3, 00:56:44 ago
Route metric is 0, traffic share count is 1
AS Hops 1
Route tag 64512
MPLS label: none

csr1000v-4#show ip route 10.1.3.1
Routing entry for 10.1.3.0/24
Known via "ospf 1", distance 110, metric 3, type intra area
Last update from 10.4.5.5 on GigabitEthernet3, 01:56:49 ago
Routing Descriptor Blocks:
* 10.4.5.5, from 3.3.3.3, 02:14:46 ago, via GigabitEthernet3
Route metric is 3, traffic share count is 1

---

csr1000v-5#show ip route 172.16.1.1
Routing entry for 172.16.1.1/32
Known via "bgp 64513", distance 200, metric 0
Tag 64512, type internal
Last update from 10.1.2.1 00:56:26 ago
Routing Descriptor Blocks:
* 10.1.2.1, from 2.2.2.2, 00:56:26 ago
Route metric is 0, traffic share count is 1
AS Hops 1
Route tag 64512
MPLS label: none

csr1000v-5#show ip route 10.1.2.1
Routing entry for 10.1.2.0/24
Known via "ospf 1", distance 110, metric 3, type intra area
Last update from 10.4.5.4 on GigabitEthernet3, 01:57:12 ago
Routing Descriptor Blocks:
* 10.4.5.4, from 2.2.2.2, 02:16:49 ago, via GigabitEthernet3
Route metric is 3, traffic share count is 1

Since we are using MPLS, we must look one level deeper into CEF.

csr1000v-4#show ip cef 10.1.3.1
10.1.3.1/32
nexthop 10.4.5.5 GigabitEthernet3 label 20-(local:20)

---

csr1000v-5#show ip cef 10.1.2.1
10.1.2.1/32
nexthop 10.4.5.4 GigabitEthernet3 label 19-(local:19)

This is the magic that we need. We can see that either of these routers will perform an MPLS tag imposition so the next hop will be forwarding based on the label. In the above output, we can see that csr1000v-4 would impose label 20 as the packet leaves Gig3 (toward) csr1000v-5. So lets see what csr1000v-5 will do with that.

csr1000v-5#show mpls forwarding-table labels 20
Local Outgoing Prefix Bytes Label Outgoing Next Hop
Label Label or Tunnel Id Switched interface
20 Pop Label 10.1.3.0/24 126 Gi4 10.3.5.3

According to this output, we see that csr1000v-5 will pop (remove) label 20 and forward it out Gi4 (toward 10.3.5.3).

Similar output can be seen if we follow this back the other direction.

csr1000v-5#show ip cef 10.1.2.1
10.1.2.1/32
nexthop 10.4.5.4 GigabitEthernet3 label 19-(local:19)
csr1000v-5#

---

csr1000v-4#show mpls forwarding-table labels 19
Local Outgoing Prefix Bytes Label Outgoing Next Hop
Label Label or Tunnel Id Switched interface
19 Pop Label 10.1.2.0/24 696 Gi4 10.2.4.2
csr1000v-4#

Now we have two reliable paths to 172.16.1.1 from csr1000v-6. It is worth noting these are still not optimal paths because they packets are flowing toward the route reflectors out of the network.

I hope these examples have helped you better understand these technologies. If you have questions or feedback, please comment below.

Disclaimer: This article includes the independent thoughts, opinions, commentary or technical detail of Paul Stewart. This may or may does not reflect the position of past, present or future employers.

No related content found.

About Paul Stewart, CCIE 26009 (Security)

Paul is a Network and Security Engineer, Trainer and Blogger who enjoys understanding how things really work. With over 15 years of experience in the technology industry, Paul has helped many organizations build, maintain and secure their networks and systems.
This entry was posted in Design and tagged . Bookmark the permalink.