Sometimes two good useful features don’t work together well. That is the case with the WAN Link Load Balancing (WAN LLB) function and the Asymmetric Routing function. In case you may be under the impression that this is a situation that is so unlikely as to never occur, this article is inspired by an actual TAC issue and is being highlighted so that others can avoid the situation.
The expected behavior is that with measure-based load balancing, a session once created should use the same link and there should be no switching between IPS in the middle of a session. The symptom that presented itself was that the sessions were being disconnected and automatically reconnected. It turns out that in an asymmetric routing environment, one of the mechanisms of measure-based load balancing method causes an unintended side effect.
In order for WAN LLB to work optimally, session level persistence of the connection is required. In an asymmetric environment, when using Bandwidth (GUI) or
measure-based (CLI) load balancing methods, the routing information can change in the middle of a session.
When asymmetric routing is turned off (
asymroute disabled), SNAT is typically enabled on the policy allowing traffic out of the Virtual-WAN-Link or WAN LLB to one of many ISP links. If a session falls into a dirty state and SNAT is enabled, the route-lookup for that session will use the existing interface as a match condition when going through the routing table. This means that unless the change was due to the link actually going down, the outgoing interface will not change and the routing path will remain unchanged.
When asymmetric routing is turned on (
asymroute enabled), the session revalidation behavior is different. Even if SNAT is enabled, the option of checking the existing interface is not present as a matching condition in the search. When the routing table is scanned, it will just pick the best route regardless of what the previous interface was. Due to the measure-based Virtual WAN LLB algorithm, there is a high probability that the best route is going to be different from the existing one.
Result of the combination
When you have the combination of the WAN LLB, which works best with a persistent connection, and asymmetric routing which does not take measures to preserve a persistent connection, there is a high probability of dropped sessions. An example could be something like this:
You’re running a connection, such as RDP. Suddenly, the outgoing interfaced changes. Because Source NAT IP does not try to preserve the existing interface, the session is dropped. Now, as per asymmetric routing behavior, packets not matching a session are simply matched against a route and sent out the interface, but they are not NATed. The end result at the other end is that the RDP does not recognize the packets as being part of the same session and rejects them. The RDP connection is lost.
What this means is that it is not impossible to run WAN LLB and asymmetric routing in combination but due to the potential impact it is recommended that it be avoided if at all possible. On the other hand, you may have circumstances that require this combination and the impact of the dropping connections is acceptable to the stake holders. In the end, it is your network and you know it best, so it is your decision to make, but it’s best to make sure your decisions are informed ones.