January 3, 2011 8:25 AM
Too clever for its own good?
Sometimes it’s the most basic things that catch you out. We were recently doing some Proof of Concept testing for a large (non-UK) bank that wanted to model part of its international MPLS WAN with a view to providing Layer 2 and Layer 3 connectivity between sites, with some very complex QoS and extremely tight reconvergence requirements.
Serious bits of kit too—a mix of Cisco Nexus 7000s (I’ve always liked them since I saw the blue LEDs) and 7600s, with some Catalyst 6500s and a few miscellaneous ASRs. Quite an impressive test bed.
As you’d expect (though it would be nice, if naïve, to think this wouldn’t happen), the different platforms do things slightly differently, and with some QoS features being not just platform but line-card dependant, it wasn’t the most straightforward exercise to get it all working.
So we were geared up for it to be complicated, and a fair bit of time was spent looking at analyser traces at different points of the network, trying to figure out where packets were being marked and where different VLAN tags were being added and removed as we tried to build transparent Layer 2 environments across the various pieces of hardware.
There was this one scenario that was really puzzling. Everything looked as if it was set up completely fine. But the traffic just wasn’t being marked and policed as expected. The debug information from the kit itself wasn’t really helping, so we stuck a couple of analysers on at the either side of the traffic flows and got settled down to some serious packet analysis.
There was something very odd happening. We could see traffic marked with the customer’s VLAN ids going in at one end, but when it popped out at the other side—no VLAN tags. Something was stripping out the VLAN information. This should have been passed transparently through the network—okay, with a lot of other information added, but nothing should be tampering with the original frame structure.
We spent ages checking through all the configurations, and just couldn’t see why this could be happening. We just about ripped the network apart trying to figure out what was going on. Eventually we got it all back together again and had another look, not really expecting to see anything different.
Except we did. The VLAN information was now clearly showing up at the destination side of the network. We could see it on the Sniffer. And at the source end—hang on: no VLAN information. That’s impossible—the network can’t be adding the correct VLAN information on its own. Are we going mad?
It was one of those cartoon lightbulb moments. We had two analysers in the lab. One a “normal” purpose-built Sniffer; the other analyser software on a high-end laptop. Guess what—on the Sniffer, we saw VLAN information wherever we plugged it in.
On the laptop we didn’t. You see, even set to promiscuous mode to capture all data, the laptop NIC was clever enough to know that PCs didn’t normally expect to see VLAN information, so was stripping it off and sending a pure Ethernet frame up the stack to the analyser software, which never got a chance to interpret or even display it.
This never used to happen—NICs weren’t that sophisticated. A lot of them (including the one on my own laptop) still aren’t. Some of the latest, greatest ones obviously are. So that’s progress for you. We had to hack registry settings to make it stop doing this, as there was no option called “just show us the data you receive, you horrible machine”.
It still didn’t fix the problem—turns out we had to tweak a couple of QoS settings on a 7600, but once we’d stopped spending the better part of a day trying to fix a problem that didn’t exist, it wasn’t actually all that difficult to figure out.
And there was us all assuming that this issue we were seeing was a network problem. Doh!