Periodically, I get a message from someone asking for troubleshooting help. The most recent of these went something like the following (paraphrasing)–
I have the following routers, R1 through R5, and I cannot ping R5 from R1. Please tell me what the problem is.
In these cases, I could review the configuration or import them into my lab. Inevitably, that might solve the problem for the individual. However, it doesn’t really help the individual solve problems in the future. I prefer to try to help others think through the problem and reach the solution on their own.
Given the symptom of R1 not being able to ping R5, what could that mean? My initial thoughts are–
- R1 isn’t producing packets destined to R5
- R5 isn’t producing packets destined to R1
- One of the routers between R1 and R5 doesn’t know how to reach R5
- One of the routers between R5 and R1 doesn’t know how to reach R1
- Traffic is being filtered somewhere along the way
The first step in troubleshooting this is to understand that there should be two flows being produced. The first flow is a series of echo requests from R1 to R5 and the other flow is a series of echo-replies from R5 to R1. The destination of the first flow is the IP address used in the ping command. The source of the first flow might be an IP address specified in an extended ping. If the source isn’t specified, the source address is derived from the egress interface of the router (R1).
The second flow is initiated by R5 and only occurs if it received packets from the first flow. This second flow has a destination address that is derived from the source of the first flow.
So when I troubleshoot something like this, I get a mental picture of the source and destination of each flow. The most likely root cause of symptoms 1 through 4 is a missing route. Armed with this information, I might look at each router’s routing table (show ip route). What I am looking to confirm is a loop free path from R1 to the IP address we are pinging on R5. I am also looking for a route back to the source of the request. These are the paths to support the two flows we previously identified.
If these paths don’t exist, the possible culprits include–
- Misconfigured static route(s)
- Improper dynamic routing protocol configuration
- One or more interfaces are not UP
If all of the routes exist to an from R1 to R5, it is time to start looking in more obscure places–
- Are there ACL’s on any of the interfaces
- Is there anything strange going on with policy based routing
- Crypto Policies
A good way to find the general vicinity of the problem is to do a traceroute from R1 to R5. Then I might do a subsequent traceroute from R5 to R1. The result of this, combined with a topology diagram, should give a better idea where the problem(s) may be.
Armed with the output of this command and an understanding of the topology, effort can be focused on routers in a given area of the network. Then the technologies being used can be assessed an revalidated.
In some cases, the root cause may be a simple misconfiguration. One example might be failing to include an interface in the configuration for a routing protocol. Other cases may be a result of a design issue. For example, RIP or EIGRP might not behave as expected when two subnets of a major network are connected by another network. This is known as a discontiguous network.
The end result of proper troubleshooting is that we can focus in on the problem area. This helps make troubleshooting tools and internet searches much more useful. Ultimately the goal is to understand and resolve the problem. These are skills that are important to work through as we build lab scenarios. Troubleshooting production networks can be expensive, stressful and challenging. Therefore the ability to troubleshoot should be something that is learned early and exercised often.
Disclaimer: This article includes the independent thoughts, opinions, commentary or technical detail of Paul Stewart. This may or may not reflect the position of past, present or future employers.