Failure Analysis: An Interesting way to Break CAPWAP

I recently stumbled into what I think is a very interesting failure scenario with a Cisco Wireless solution. This was a traditional controller based solution that leveraged a CAPWAP data and control plane. The symptoms were fairly consistent and strange.

Symptoms:

  • When issues are occurring, all uploads reduce to about 1.5Mb/s
  • Installing a new AP seems to solve the issue
  • Issue re-occurs in a few minutes
  • Issues only occur for one specific site
  • Wireless is configured consistently across 5 sites
  • RF is not an issue

Topology:

When I got involved with this, a few people had reviewed the configuration and TAC had been involved for some time. While on-site, I took a look at RF and channel utilization (expecting to find it to be ugly since I knew it was heavily dependent on 2.4Ghz). My first order of business was to spin up a test AP in its own group and advertise a test SSID on a 5Ghz channel. Upon doing so, both iPerf and Speedtest were >50Mb/s. My initial thought was that the density needed to be increased and the radios tweaked to get more clients on 5Ghz. However, a few minutes into my testing–my upload also went to similar speeds (<1.5Mb/s).

My next step was to configure FlexConnect on the test AP and t0 drop the traffic into a local VLAN. This should remove anything to do with CAPWAP as a possible culprit. After doing so, testing showed that there were no issues. Even after an extended period of time, we saw no performance degradation. This reaffirmed that there were no issues with RF and we were likely looking at something impacting CAPWAP throughput.

Having a very busy schedule, I asked the customer to engage the Metro-E service provider and see what they can tell us about the CAPWAP traffic (UDP/5246-5247) for that location. Since there was no additional overlay protocol (DMVPN, MACSEC, etc), I thought it would be interesting to see if the provider was seeing anything abnormal.

About a day later, I received a very interesting email. The service provider had analyzed the CAPWAP traffic from the AP to the WLC and from the WLC to the AP. The traffic from the WLC to AP seemed normal. However, the traffic from the AP to WLC was being sent to a MAC address that was NOT known in the service provider network. I also found that the traffic was being rate-limited to 2Mb/s by a BUM (Broadcast, Unknown, Multicast) policy.

It was at this point that I knew we could solve the problem. With this information in mind, I proceeded to do a packet walk from the AP to the controller. Here is what I found.


For discussion, I will attach psuedo mac addresses to this topology.

So when we apply packet and frame forwarding logic, we have the following:

  • AP uses its local L3 switch as a GW, which routes the packet for 3.3.3.3
  • 3.3.3.3 (WLC) is L3 directly connected VLAN 200 so the the switch connected to the AP routes the packet to VLAN200.
  • VLAN200 ARPs for 3.3.3.3. The response to this is received by the switch on the left. The ARP response populates the MAC address tables from right to left.
  • AP forwards CAPWAP to WLC using the following L3 path (AP->LeftSwitch->WLC)
  • WLC forwards CAPWAP to AP using the following L3 path (WLC->RightSwitch->AP)
  • The only time the Metro-E service sees MAC BA as a source is when the WLC responds to an ARP request
  • From a Metro-E perspective, AP to WLC communication uses the following MAC addresses–SRC:AC, DST:BA
  • From a Metro-E perspective, WLC to AP communication uses the following MAC addresses–SRC:BC, DST:AD (due to the IGP adjacency)

The Workaround

We did a temporary workaround by creating a static route on the LeftSwitch for 3.3.3.3/32. Setting the next hop to 4.4.4.1 forced an outbound destination to a MAC (BC) address that wasn’t being flushed out of the Metro-E providers tables. Ultimately, the goal is to prune VLAN200 out of the remote location and remove the static route.

Analysis and Conclusion

ARP entries often default to four hours. MAC table entries often age out after 5 minutes. Booting up a new AP forced an ARP entry between the LeftSwitch’s VLAN200 and WLC. This entry would have remained for four hours. The ARP response that created this entry would have populated the Metro-E service but timed out after a given period of time. Based on testing, I would guess this was about 5 minutes (which is a common default). Once the entry timed out, the AP to WLC traffic would become unknown and rate limited by the providers BUM policy.

If you have any feedback or comments, please share below.

Disclaimer: This article includes the independent thoughts, opinions, commentary or technical detail of Paul Stewart. This may or may does not reflect the position of past, present or future employers.

Posted in Other | Leave a comment

Cisco WLC for Wired to Wireless mDNS and Bonjour

Bonjour and mDNS are discovery mechanisms that generally work effortlessly within a single VLAN. Those attempting to implement these protocols in a multi subnet environment often run into some significant challenges. The typical use of CAPWAP in an enterprise wireless network adds to the segmentation between wired and wireless domains and requires special attention with devices like Applet TVs and Bonjour based printers. In this article, I will address the use case of allowing a wired Apple TV to be seen and used by a wireless client. We will also do some basic filtering to contain those advertisements to a single building.

The Starting Topology

Continue reading

Posted in Design | Comments Off on Cisco WLC for Wired to Wireless mDNS and Bonjour

Traceroute through Firepower Threat Defense

Nearly eight years ago, I wrote an article about configuring the ASA to permit Traceroute and how to make the device show up in the output. That article is still relevant and gets quite a few hits every day. I wanted to put together a similar How-To article for those using Firepower Threat Defense.

This article examines the configuration required to allow proper traceroute functionality in an FTD environment. The examples shown here leverage Firepower Management Center to manage Firepower Threat Defense. As with any configuration, please assess the security impact and applicability to your environment before implementing.

Before we get started, it is important to understand that there are two basic types of Traceroute implementations. I am using OSX for testing and it defaults to using UDP packets for the test. However, I can also test with ICMP using the -I option. I am already permitting all outbound traffic, so this is not a problem of allowing the UDP or ICMP toward the destination. Continue reading

Posted in How-To | Tagged | Comments Off on Traceroute through Firepower Threat Defense

Connecting Postman to Firepower Management Center API

A few months back, I wrote an article about my Initial Observation on the Firepower FMC API. Today’s article takes this one step further with a step-to-step guide to connecting Postman to the FMC API. It is worth noting that this is not a directly useful process, but a process that should be expanded upon to achieve any objective that is better served by an API. Use cases might include bulk changes or integration with other security applications.

The Official REST API Guide can be found at the following URL.

Firepower REST API Quick Start Guide

It is also worth mentioning that the online API documentation can be found at https://<FMC-IP>/api-explorer on the FMC installation.

The general flow of the process we will be following is:

  • Connect to FMC using basic authentication
  • View the response to obtain the X-auth-access-token and DOMAIN-UUID
  • Leverage the X-auth-access-token and DOMAIN-UUID in a request for access control policies
  • Leverage the token, domain and policy ID to obtain a list of rules in that policy
  • Leverage the token, domain, policy ID and rule ID to obtain rule details

Throughout this process, we will not store any variables and the process will be completely manual for comprehensive understanding. We will leverage Postman as a REST client. Continue reading

Posted in How-To | Tagged , | Comments Off on Connecting Postman to Firepower Management Center API

MPLS and VRFs – Filling the Gaps

A few years ago, I took an SE role covering Higher Education accounts. I quickly realized one of the deficits Cisco has in the CCNA program as it pertains to networks with a certain set of requirements. While the program is jam-packed with great information, there are a few concepts that an administrator may have to deal with that catch them by surprise. Three related topics that aren’t covered in CCNA Routing and Switching are shown below.

This article is meant to serve as a starting point for those who may be very strong with routing and switching but lack the exposure to VRFs, Layer 3 Segmentation, and MPLS. It is a good starting point for new employees that might face this challenge and it will certainly help them gain perspective on these topics. Continue reading

Posted in Career | Tagged , | Comments Off on MPLS and VRFs – Filling the Gaps

MPLS Intro Series – Route Reflectors

This is the final article in the MPLS Intro Series and will quickly mention the need for route reflectors. This need is driven by the iBGP requirement for a full mesh of peers. This means that a network with only 4 PE nodes would have 6 iBGP peering sessions. This is calculated as n(n-1)/2 where n is the number of PE nodes required for a given topology.

As the scale grows, the need for a centralized peering point becomes obvious. For example, a network with 10 PE nodes would have 45 iBGP sessions to meet the full mesh requirement. Route reflectors overcome this rule by becoming a central point that can advertise routes between iBGP “route reflector clients”. The diagram below actually has more peering sessions than the one above (without RR). However, as a network continues to grow, the full mesh becomes quite challenging. Continue reading

Posted in Design | Tagged | Comments Off on MPLS Intro Series – Route Reflectors