4

Tunnels experiencing unexpected performance degradation #30

Open
opened 2025-09-10 14:47:25 +00:00 by Nic Elliott · 0 comments
Owner

Summary

In rare cases aggregated connections see high CPU usage and reduced throughput. This is not always due to the underlying ISP links that make up the connection itself (although link faults can still occur), but rather how the tunnels are processed and aggregated by the overall platform. Our team is investigating the underlying conditions and testing both mitigations and alternative methods to ensure stable long-term performance. We remain committed to providing high bandwidth, high performance, multi-link connectivity for internet, WAN and cloud, and this is being given the highest priority within the development team.

Technical detail

We have identified cases where OpenVPN TAP (Layer 2) tunnels are not performing as expected. Tunnels that previously performed reliably for years are now, in some rare circumstances, showing very high CPU usage and reduced bandwidth, even though the underlying links and hardware are capable of more.

Symptoms

  • High CPU load on the EVX or NFR endpoints when traffic is sent through TAPs as part of Tunnel Aggregation Groups (TAGs).
  • Lower than expected throughput, despite sufficient link speed and capacity.
  • Problem is intermittent, only occurring in certain environments or under certain workloads.

Scope

  • This issue affects Layer 2 tunnelling used as transport for TAGs.
  • Most frequently observed with OpenVPN TAP interfaces.
  • We have also seen similar reductions in performance with some GRETAP tunnels which, until now, been used where OpenVPN has had performance issues.

Background

How OpenVPN TAP consumes CPU

OpenVPN TAP tunnels run entirely in userspace, which means every Ethernet frame has to be copied from the kernel into the OpenVPN process, encrypted, and then copied back again. This repeated context switching adds significant overhead, especially for high packet-per-second workloads. CPU usage scales directly with the number of packets, and throughput can be significantly impacted, even if total available bandwidth is much higher.

Under normal circumstances our networking stack prevents CPU saturation with various queuing and calibration mechanisms so that higher latency, packet loss and lower throughput are avoided.

How GRETAP consumes CPU

GRETAP tunnels operate differently: they are implemented inside the kernel networking stack, so frames are encapsulated in GRE and forwarded without userspace involvement. This avoids the copy overhead of OpenVPN, but places more emphasis on the kernel’s packet processing pipeline. Performance is therefore more sensitive to:

  • Interrupt load on the NIC (each packet hitting an IRQ handler).
  • SoftIRQ scheduling (how the kernel handles bursts of packets).
  • Offload settings (GRO, TSO, checksum offload), which can reduce or increase CPU consumption depending on workload.

The result is that GRETAP may handle large packets more efficiently than OpenVPN TAP, but can still hit CPU limits with high rates of small packets or when interrupt moderation is not optimal.

In practice this means that while both TAP and GRETAP can exhibit high CPU usage, the nature of the load is different.

Why TAP/GRETAP are sensitive here

Although we are not carrying a full Ethernet broadcast domain across sites in most circumstances, TAP and GRETAP still encapsulate and process traffic as raw Ethernet frames. This means:

  • Per-packet overhead is higher than with Layer 3 tunnels, as every frame must be copied and encrypted in userspace.
  • Aggregation adds complexity: the Tunnel Aggregation Group distributes traffic across multiple TAP tunnels, so packet ordering, MTU alignment, and rebalancing behaviour can all influence throughput.
  • Small-packet efficiency matters: even if we aren’t carrying broadcast storms, many internet applications generate large numbers of small packets (ACKs, control messages, VoIP, etc.), which may disproportionately stress TAP-style tunnels.

Possible contributing factors

While the root cause is still under investigation, factors that may play a role include:

  • Kernel and driver changes affecting how TAP/GRETAP frames are queued and processed, especially when aggregated.
  • CPU overhead of userspace encryption, worsened by recent kernel security mitigations that add syscall/context-switch cost.
  • Crypto acceleration regressions (e.g. AES-NI not being engaged in some environments).
  • Path MTU/fragmentation behaviour between TAP tunnels and the aggregated interface.
  • Changes in traffic mix — modern applications may generate higher sustained packet-per-second loads than in the past.
  • GRETAPs use CPU in a different way to OpenVPN TAPs

These factors may combine to create a situation where TAP and GRETAP tunnels require significant CPU for modest throughput, even when the same designs previously performed well.

Current status

  • Root cause has not yet been determined.
  • The behaviour only appears in a subset of environments and under certain conditions.
  • Our development team is actively investigating, with a focus on TAP/GRETAP performance as part of an aggregated connection.
  • We are also testing alternative Layer 2 tunnelling methods to evaluate whether they are more efficient in this role as network underlay tunnel.

Next steps

  • Collect further diagnostics (packet captures, CPU profiles, MTU traces) from affected deployments.
  • Benchmark different kernel versions, cipher suites, and aggregation modes.
  • Validate whether fragmentation, reordering, or scheduling across tunnels is a contributing factor.
  • Test and evaluate alternative tunnelling technologies for long-term robustness.
  • Provide recommended mitigations and configuration guidance once findings are confirmed.
## Summary In rare cases aggregated connections see high CPU usage and reduced throughput. This is not always due to the underlying ISP links that make up the connection itself (although link faults can still occur), but rather how the tunnels are processed and aggregated by the overall platform. Our team is investigating the underlying conditions and testing both mitigations and alternative methods to ensure stable long-term performance. We remain committed to providing high bandwidth, high performance, multi-link connectivity for internet, WAN and cloud, and this is being given the highest priority within the development team. ## Technical detail We have identified cases where OpenVPN TAP (Layer 2) tunnels are not performing as expected. Tunnels that previously performed reliably for years are now, in some rare circumstances, showing very high CPU usage and reduced bandwidth, even though the underlying links and hardware are capable of more. ### Symptoms - High CPU load on the EVX or NFR endpoints when traffic is sent through TAPs as part of Tunnel Aggregation Groups (TAGs). - Lower than expected throughput, despite sufficient link speed and capacity. - Problem is intermittent, only occurring in certain environments or under certain workloads. ### Scope - This issue affects Layer 2 tunnelling used as transport for TAGs. - Most frequently observed with OpenVPN TAP interfaces. - We have also seen similar reductions in performance with some GRETAP tunnels which, until now, been used where OpenVPN has had performance issues. ## Background ### How OpenVPN TAP consumes CPU OpenVPN TAP tunnels run entirely in userspace, which means every Ethernet frame has to be copied from the kernel into the OpenVPN process, encrypted, and then copied back again. This repeated context switching adds significant overhead, especially for high packet-per-second workloads. CPU usage scales directly with the number of packets, and throughput can be significantly impacted, even if total available bandwidth is much higher. Under normal circumstances our networking stack prevents CPU saturation with various queuing and calibration mechanisms so that higher latency, packet loss and lower throughput are avoided. ### How GRETAP consumes CPU GRETAP tunnels operate differently: they are implemented inside the kernel networking stack, so frames are encapsulated in GRE and forwarded without userspace involvement. This avoids the copy overhead of OpenVPN, but places more emphasis on the kernel’s packet processing pipeline. Performance is therefore more sensitive to: - Interrupt load on the NIC (each packet hitting an IRQ handler). - SoftIRQ scheduling (how the kernel handles bursts of packets). - Offload settings (GRO, TSO, checksum offload), which can reduce or increase CPU consumption depending on workload. The result is that GRETAP may handle large packets more efficiently than OpenVPN TAP, but can still hit CPU limits with high rates of small packets or when interrupt moderation is not optimal. In practice this means that while both TAP and GRETAP can exhibit high CPU usage, the nature of the load is different. ### Why TAP/GRETAP are sensitive here Although we are not carrying a full Ethernet broadcast domain across sites in most circumstances, TAP and GRETAP still encapsulate and process traffic as raw Ethernet frames. This means: - Per-packet overhead is higher than with Layer 3 tunnels, as every frame must be copied and encrypted in userspace. - Aggregation adds complexity: the Tunnel Aggregation Group distributes traffic across multiple TAP tunnels, so packet ordering, MTU alignment, and rebalancing behaviour can all influence throughput. - Small-packet efficiency matters: even if we aren’t carrying broadcast storms, many internet applications generate large numbers of small packets (ACKs, control messages, VoIP, etc.), which may disproportionately stress TAP-style tunnels. ### Possible contributing factors While the root cause is still under investigation, factors that may play a role include: - Kernel and driver changes affecting how TAP/GRETAP frames are queued and processed, especially when aggregated. - CPU overhead of userspace encryption, worsened by recent kernel security mitigations that add syscall/context-switch cost. - Crypto acceleration regressions (e.g. AES-NI not being engaged in some environments). - Path MTU/fragmentation behaviour between TAP tunnels and the aggregated interface. - Changes in traffic mix — modern applications may generate higher sustained packet-per-second loads than in the past. - GRETAPs use CPU in a different way to OpenVPN TAPs These factors may combine to create a situation where TAP and GRETAP tunnels require significant CPU for modest throughput, even when the same designs previously performed well. ## Current status - Root cause has not yet been determined. - The behaviour only appears in a subset of environments and under certain conditions. - Our development team is actively investigating, with a focus on TAP/GRETAP performance as part of an aggregated connection. - We are also testing alternative Layer 2 tunnelling methods to evaluate whether they are more efficient in this role as network underlay tunnel. ## Next steps - Collect further diagnostics (packet captures, CPU profiles, MTU traces) from affected deployments. - Benchmark different kernel versions, cipher suites, and aggregation modes. - Validate whether fragmentation, reordering, or scheduling across tunnels is a contributing factor. - Test and evaluate alternative tunnelling technologies for long-term robustness. - Provide recommended mitigations and configuration guidance once findings are confirmed.
Nic Elliott added the
Network software
label 2025-09-10 14:47:25 +00:00
Sign in to join this conversation.
No description provided.