Network architecture to Capture Packets

Posted by fmadio | 100G Ethernet

Packet capture is good and all, but how exactly do you capture packets on a network ? There`s a range of approaches from Layer1 optical TAP`s to Smart SPAN protocols and everything in between. We dig into the various Pro`s and Con`s of each approach.


  • microburst analysis
Network Topology for Packet Capture

Packet capture is great, but how do you actually setup your network topology to get the packets into 10G, 40G or 100G packet sniffer appliance? Turns out there's about 5 different options all with their various pros and cons.

Switch SPAN / MIRROR Port

This is probably the easiest way to get packet capture up and running. It requires the switch to be configured in whats called port "SPAN" or "MIRROR" mode depending on the switch vendors terminology. The idea is simple, make a copy of all ingress packets from Port 0, 1, 2, ... N and forward it to the SPAN port in addition to forwarding the packet to the correct output port. Its simple to setup, does not require any new hardware. Network topology shown in in the picture below.


packet capture span port
Pros:
  • - Simple to Setup
  • - Use existing hardware
  • - Capture all NxN packets

Cons:
  • - Uses switch onchip buffer space
  • - Subject to SPAN port queue delay
  • - Time accuracy not great


Its a good basic way to to start packet capture, but it will impact your network performance and the time stamp resolution is not great. This is best used on edge switch`s for Security / IDS packet analysis workloads.


Inline Packet Capture

Inline packet capture is even simpler to setup but has a major downside, if the packet capture device becomes inactive (e.g. during a system reboot) the entire link goes down. This is a major problem if the link your capturing requires high uptime, as even 1 minute downtime for a system reboot or power cycle can have a large impact on the entire network. The other problem with this approach is it introduces latency and jitter onto the link. How much jitter and latency could be 10`s of nanoseconds, or could be 10`s of milliseconds depending on what hardware you are using.


packet capture inline
Pros:
  • - Simple and easy
  • - No Additional hardware

Cons:
  • - Link Downtime
  • - Latency and Jitter


There`s not many reason to use this setup, yet in practice this gets deployed due to cost and simplicity. The primary reason to not do this is link down time, as you need specifically designed Capture Cards with "Bypass Mode" that forward traffic when power loss occurs. A better approach is using passive TAP`s on the link, which de-couples the capture system from link uptime.


Layer 1 TAP Packet Capture

Layer 1 TAP`s are excellent for zero impact network packet capture. Optical TAP`s work by splitting the light from one fiber to 2 separate fibers which are then fed into the appropriate transceivers. Splits are typically 50/50 or 90/10 it depends on the Layer 1 link type. For example a long range single mode fiber, e.g 10KM LR single mode you probably want 90% of the signal going to the line card that minimizes downtime risk. But for a short range multi-mode fiber a 50/50 split works well in practice.

Using Layer 1 TAP`s and feeding the tapped fiber into the capture device is one of the best approaches to capturing packets. Not only is there near-zero impact on the network, its also extremely accurate for timing purposes as there`s no external devices involved at all.


packet capture layer 1 tap
Pros:
  • - No link Downtime
  • - Excellent time accuracy
  • - Zero impact on the network

Cons:
  • - Additional Hardware & Cabling
  •  
  •  


This is the recommended way to capture all packets for high uptime links. The only problem, cabling can get a bit messy. The only negative to using this approach is if the optics/transceivers are extremely expensive, e.g. 100G LR4 links. In such case the End point and the Packet capture device require the same transceiver's, which for 100G LR4 adds significant cost ($10K+ USD+ for 100G LR4 in 2015). If using SPAN or Inline you only need one LR4 transceiver's, as the local port can use SR.

However, for latency sensitive analysis, Layer 1 TAP`s are the only way to go. It gives you the best accuracy and has zero impact on the real link, the best of both worlds.


Layer 1 TAP + Switch Packet Capture

One of the problems with Layer 1 TAP`s is it scales poorly. For one TAP you need 2x 10G capture ports (Rx & Tx lines) so if you have 16 10G duplex lines to tap, it results in 32 10G Rx only ports. If your packet capture device can do 2 x 10G ports it translates to purchasing 16 packet sniffers. Even for our 1U 10G packet sniffer that's a size-able percentage of an entire rack!

Thus enter the aggregation layer. The idea is using a switch or a dedicated 10G aggregation switch you can plug in all the TAP ports into a switch, and then SPAN the aggregated data to the packet capture device, as shown in the diagram below.


packet capture layer 1 tap
Pros:
  • - No link Downtime
  • - Near Zero impact on the network
  • - Cost efficient / Power efficient / Space efficient

Cons:
  • - Timing accuracy not the best
  •  
  •  


This setup is good, as its completely de-coupled from the real network due to all the passive TAP`s. However things get murky at the aggregation layer. If your using regular run of the mill 10G switch the timing accuracy is quite franky going to suck - the span port will suffer from queuing delays. If its a fancy cut-through switch the accuracy will be better but any packet that requires queuing (e.g. packets arriving from the tap at the same time) will have 100`s of nanoseconds of timing error.

The problem when using a switch and SPAN port is the packet`s time stamp is set by the packet sniffer (highlighted in Red), which is behind the Nx1 MUX. Thus for the ultimate passive network packet capture timing accuracy we need a different plan.


Layer 1 TAP + Fancy SPAN Packet Capture

... and finally we arrive at the ultimate network capture setup, using an array of layer 1 TAP`s, fancy pants SPAN session and a 10G line rate packet sniffer. This is for applications that require ultimate time accuracy, meaning +/- 10 nanoseconds.

The setup is almost the same as the above, except using ingress switch time stamping "fancy pants SPAN" sessions. This is available with the latest Arista (EOS span) and Cisco (ER span) protocols. The key difference is the Fancy SPAN adds meta-data to the packet as it transits though the switch and you guessed it, that includes a hardware timestamp when the packet was first received on the switch ingress port not egress (highlighted in red). Resulting in the packet capture device timestamps being replaced by the metadata timestamp the aggregation switch added to the packet.


packet capture layer 1 tap
Pros:
  • - No link Downtime
  • - Zero impact on the network
  • - Extreme time accuracy
  • - Power efficient / Space efficient

Cons:
  • - Expensive
  •  
  •  

This is the best money can buy right now and its pretty dam good too. Zero network impact and real 1 nsec accurate timestamps. There`s similar alternatives for timestamp aggregation from VSS / Gigamon / MetaMako / Exablaze / and others, these are dedicated to aggregation and time stamping but the price differential is not that different to a switch.. so better off getting a real switch.

Some aggregation switch vendor info.

Cisco ER Span
Cisco ERSPAN RFC
Arista EOS Time stamp
Arista EOS Time stamp FAQ

Summary

Packet capture is never a stand alone system, it only works when embedded in a well designed and deployed network architecture. I hope this gives you some ideas on how to architect your network for maximum cost efficiency or maximum time precision.

If there`s an approach to sourcing packets we have left out please contact us!