IT support: Redundant Link Graceful Internet Load Balance/Failover

[10/04/2011 04:44:00 PM

There are a lot of questions in this forum regarding load balance of two ISP in terms of Internet-access related issue. The idea is to take connection to the alternate or second ISP in case the main or the 1st ISP connection is down, bouncing, or slow.

It might sound simple to have, but not quite simple to implement. There are several factors to consider as follow:

1. NAT (Network Address Translation) between private and public IP address
2. IP Address Reachability
3. Telco Local Loop
4. Power Outage or Mother Nature

NAT between private and public IP address

Let's say you have two independent ISP. You receive different subnet from each ISP. You decide to use the 1st ISP as your main connection to the Internet and the 2nd ISP as backup. You have a private network (using i.e. 10.x.x.x, 172.16.x.x, or 192.168.x.x) that is NAT to both ISP public IP address.

Even using OER and in occasion of simple Internet browsing, the connection might not gracefully switch from 1st ISP to the 2nd in case of down link. This situation applies when combining OER with static routes. The reason is that some applications (including simple Internet browsing) are sensitive to switching public IP address, even the 2nd ISP public IP address is NAT into the same physical internal device.

IP Address Reachability

As mentioned above, some applications (including simple Internet browsing that is using HTTP or HTTPS) are sensitive to public IP address switching from ISP #1 IP address to ISP #2's. This situation is true especially when dealing with TCP connection (i.e. HTTP, HTTPS, FTP, Mail). On TCP connection, basically you need to have the same IP address all the time.

Therefore when the main ISP connection is down, the 2nd ISP must have the knowledge on how to reach the main ISP public IP address to keep the current connection working. This IP reachable situation applies to traffic from the Internet entering the router and from inside LAN leaving the router to the Internet.

If you are a SOHO (Small Office or Home Office) user that only have broadband connection (DSL or cable Internet), then most of the time both of your ISP do not trade the knowledge on how to reach each other IP address. This situation then causes the 2nd ISP never has the knowledge on how to reach the main ISP public IP address or vice versa, in case of down link.

Telco Local Loop

Ever notice how the physical cable from your site or place goes to the ISP? If you are a SOHO user that only have broadband connection, then most likely the physical cable from your site are using the same cable bundle to the same CO (Telco Central Office). If somehow the cable bundle got disconnected (i.e. by falling tree), then connection to both ISP would be disconnected as well.

Power Outage or Mother Nature

Power Outage or Mother Nature factor is always haunting everybody, even large corporations. Keep in mind if you have power outage in your area, then connection to both ISP might as well disconnected. Mother Nature (i.e. tornado, lightning, earthquake, fire) could be causing the same effects.

Solutions

There are several network designs to accomplish load balance gracefully between two redundant links.

1. Have a multilink connection to the same ISP over different POP (Point of Presence)
2. Have a multilink connection to the same ISP using two different SLA (Service Level Agreement) or different link technology
3. Have a "virtual multilink connection" to the same 3rd ISP over two ISP
4. Have multiple links to two different ISPs

Multilink over different POP (POP Diversity)

This is basically the traditional established choice to provide the load balance. Usually the ISP requires you to have redundant T1/E1 of Frame Relay or point-to-point links (leased line or dedicated line) from your site to their nearest POP, in form of bonded T1/E1 circuits.

From physical cable connection redundancy perspective, each link should terminate at different POP. This is to ensure that you still have connection in case one of the POP fails.

In addition, you also need to discuss with your ISP as to how these POP terminate to. The ideal is to have each POP terminate to different ISP network or at least different CO. When both POP terminate to the same CO, then there is a single point of failure on the CO.

In bonded T1/E1 circuits, you will not assign two different IP addresses to each link. Instead you bond both links into one larger link, and assign just one IP address to the larger link. Physically your data might travel over the 1st or 2nd link, however logically (in IP perspective) the data travel over the same link.

Since there are actually at least two different physical circuits, a situation when one circuit is down; the 2nd circuit will automatically take over all data from the 1st circuit. Further, overload data on one circuit will activate and move into the 2nd circuit. These mechanisms are taken care of by the layer 1 and layer 2 (transparent from IP perspective). Therefore there is no need of fancy configuration on the router (no need of OER, BGP, nor any other similar stuff) since from IP perspective, the link is still up so then the router will be passing data as usual.

Usually your ISP only requires static route over the bonded link. No need to run BGP as mentioned previously (unless you ask the ISP to do so).

When one circuit is down and 2nd one is up, you might experience latency; which make sense. However your crucial applications are still able to work, which is the good news. To eliminate the latency, you can just contact your circuit provider (telco or ISP) to take a look at the circuit and repair it until both circuits are up.

Multilink using two different SLA or different link technology

When somehow you or your company can not yet afford to have bonded T1/E1 (or you simply choose not to), then you might consider having two links with different technology, i.e. Frame Relay and DSL. DSL SLA level is lower than Frame Relay, therefore having these two links is more cost-affordable than the bonded T1/E1. The usual term is that the Frame Relay would be the main connection to the ISP where the DSL would be as the backup for failover design.

To maintain redundant physical cable connection, each link should terminate to different ISP network or different CO; like the previous multilink scenario.

Having two independent links to the Internet would require the IP Address Reachability situation as mentioned previously. Therefore this design requirement is usually that you need to have both links to connect to the same telco or the same ISP.

When you have Internet connection using any link technology, your ISP would provide you with subnet. For Frame Relay or T1/E1, you might receive two subnets where the 1st is for the WAN side (assigned on the Serial interfaces) and 2nd is for the LAN side (assigned on your Ethernet LAN interface). For DSL, you probably only receive single subnet. It is technically possible however, that your ISP assigns two subnets also for the DSL link for the load balance or failover design, to match the Frame Relay or T1/E1 setup.

With that in mind, then there are two possible design using this kind of connection setup

1. The Frame Relay and DSL LAN sides are in the same subnet (in the same IP Block)
2. The Frame Relay and DSL LAN sides are in the different subnet (each LAN has its own IP Block or subnet)

Both LAN sides are in the same subnet

Your telco or ISP needs to setup their end to direct all traffic to the subnet using Frame Relay as primary link and using DSL as secondary or backup link. The router at your location needs to match such setup.

Since both links have the same subnet, usually you only need one router at your location where both links terminate to. You can have a choice to have failover router in case the main one is having problem such as lost power or hardware problem.

The downside of this design is that the secondary or backup link would never be used until the main link is down. You will be also required to have periodical connection test on this backup link (i.e. every four months) to make sure that the backup link is always ready to use whenever the main link is down.

Each LAN side has its own subnet

When it is not quite possible to have the same IP block for both Frame Relay and DSL LAN sides (or you simply choose not to have such condition), then you can have the following design. You can have the telco or ISP propagate the Frame Relay LAN subnet via the Frame Relay link as primary route and via the DSL link as secondary route. Similarly, the telco or ISP also need to propagate the DSL LAN subnet via the DSL link as primary route and via the Frame Relay link as secondary route.

This kind of design usually requires you to have two routers facing the telco or the ISP; where one is for the Frame Relay link and the another for the DSL link. To interconnect the two subnets, you would also need another router sitting behind the Frame Relay and DSL routers. This 3rd router would do the failover routing between the two LAN subnets, to match the telco or ISP routing design.

The advantage of this setup is that you can have a choice to use the DSL for less-critical applications (such as browsing to the Internet) where reserve the Frame Relay bandwidth for the most-critical applications.

You also have a choice to put failover router for all three router where each has its own; or just having a failover router for the 3rd router that does the failover routing.

Side Note

As mentioned, usually the 2nd design requires at least three routers on your location. It is technically possible however to use just one router for both links and as the failover router.

The most important issue is that either design should be on your SLA with the telco or ISP, so then you can have firm faith that the failover mechanism would go smoothly at least on the telco or ISP side.

Illustration:

You need to load balance your traffic between the Frame Relay and the DSL links. For simplicity, only necessary info is shown.

Keep in mind that this illustration serves only to show you ideas of how the network is setup. This might not the actual implementation since conditions can be varied from one ISP to another. Please discuss with your ISP on how the actual implementation is going to be.

You receive the following subnets from your ISP:

Frame Relay
Serial: 1.0.0.0/30
Ethernet: 1.0.1.0/24

DSL: 1.0.0.4/30

Following is the ISP router setup

interface Serial0
description Frame Relay
ip address 1.0.0.1 255.255.255.252

interface Ethernet0
description DSL link
ip address 1.0.0.5 255.255.255.252

ip route 1.0.1.0 255.255.255.0 1.0.0.2
ip route 1.0.1.0 255.255.255.0 1.0.0.6

Following is your router setup

interface Serial0
description Frame Relay
ip address 1.0.0.2 255.255.255.252

interface Ethernet0
description DSL link
ip address 1.0.0.6 255.255.255.252

interface FastEthernet0
description LAN
ip address 1.0.1.254 255.255.255.0

ip route 0.0.0.0 0.0.0.0 1.0.0.1
ip route 0.0.0.0 0.0.0.0 1.0.0.5

Virtual Multilink to the same 3rd ISP

This could be seen as a new approach compared to the previous choice. Basically you can keep your existing two ISP connections. However you do not go to the Internet directly over either ISP. Instead your data goes to a 3rd ISP, which then forward the data to the Internet.

The possible setup is to use VPN tunnel (IP Sec ISAKMP tunnel) over each ISP connection that goes to the 3rd ISP. This 3rd ISP provides you IP addresses that are known in their network and in the Internet. Note that

* You would have site-to-site VPN between your site and the 3rd ISP
* The 3rd ISP IP addresses would be totally different than the existing two ISP public IP addresses you have
* The two existing ISP public IP addresses would only serve as VPN peer between your site and the 3rd ISP
* The IP addresses you receive from the 3rd ISP would be the actual IP addresses you use to go out to the Internet

The VPN tunnels over the two ISP between your site and the 3rd ISP would be the virtual multilink that could be cost-effective compared to the previous traditional multilink. The implementation would be similar to the 2nd design choice, which would let you to have two different LAN subnets (one for each peer) or the same IP block for both peers.

As illustration, consider the following connection setup. Let's say that currently you have two Internet connections. One is served by DSL ISP and another is served by Cable Internet ISP. The ISP are independent to each other (not under the same group, company, nor umbrella).

You plan to have load balance or automatic failover mechanism using the existing Internet connections with minimal changes. You will then use a 3rd ISP to establish two separate IPSec VPN tunnels. One tunnel goes over the 1st ISP and another tunnel rides over the 2nd ISP.

On implementation, there would be VPN device on your side and another on the 3rd ISP side. These devices will have redundant IPSec tunnels to provide the load balance and/or automatic failover mechanism.

Note that since the IPSec VPN tunnels can ride over any circuit types, you don't need to deploy special circuits. Any circuits including broadband (DSL and Cable Internet) would work.

Keep in mind that since the VPN tunnels are across the Internet and not actual dedicated links between your site and the 3rd ISP, then there are challenging connection stability factors you need to understand. These factors become more apparent when there are one or multiple intermediate ISP (backbone ISP) between your site and the 3rd ISP.

In site-to-site VPN, the networks that come in to play are your site network, your current ISP (or your two ISP) network, the 3rd ISP network, and the intermediate or backbone ISP network. In the previous two network designs (the physical connections), there are only two networks; yours and the ISP's. In network stability perspective, less network interconnect means more stability.

Followings are situations that might affect the stability connection of site-to-site VPN.

Let's say the traffic at backbone ISP (or at either of your two first ISP) has a bottleneck situation caused either by overutilized bandwidth, maintenance, routing problem, or simply administration error. This situation will affect your connection to the 3rd ISP, which could be in form of latency or even disconnected tunnel.

In addition, site-to-site VPN could be down "with no apparent reason" even when the tunnel has been up and running stable for months. Assuming you and the 3rd ISP are using reliable VPN devices and the VPN tunnel are never blocked between two sites, this situation is quite rare even though they sometime do happen.

For reliability (and security), site-to-site VPN requires static IP; which means the IP addresses are never changed for any reason. Your ISP might provide you with static IP address, however the IP address might change once there is a power outage or lightning strikes to the ISP or your equipment. Once your IP address is changing, the VPN tunnel will be down.

Multiple links to two different ISP: Introduction to BGP Multihoming

This is considered the ideal setup for full redundancy. In case you have one failed ISP, you still have another as backup. When you have both up links to both ISP, you could do load balance or load share between the two links.

Setting up this connection to your both ISP considers the following.

* You are required to run BGP with both ISP (BGP Multihoming)
* Usually on each link, you are required to have at least full T1/E1 circuit
* Each BGP relationship with each ISP should ride over dispersed POP circuit
* You are required to have Public AS (Autonomous System) number
* You are required to have Public Subnet within the Public AS number
* You are suggested to have one dedicated router for each link or each ISP
* The router is required to meet certain hardware specification, such as having certain amount of CPU power and certain amount of available RAM
* You are required to understand BGP routing concept, which is considered advanced networking topic

Keep in mind that with multiple ISP scenario, you still need to consider the basic physical connection redundancy as with a single ISP scenario. This basic includes connection to different CO or different backbone network. When both ISP terminate to the same backbone network, then you have a single point of failure on the backbone network.

The need to run BGP with both ISP

BGP is used when one ISP needs to communicate with different ISP and to whole Internet users. When you are planning to have redundancy connection over multiple ISP, you are considered as ISP eventhough your network is not like one. This is why you need to run BGP with both ISP.

The need to have at least full T1/E1 circuit to each ISP

Redundancy involving BGP requires the "real" data network that is originally designed to carry and support Internet data. Broadband connection such as cable Internet and DSL is most likely considered the "extension" of existing non-data network. Cable Internet network is originally designed to broadcast TV programs. DSL network is originally designed for voice communication (POTS). Neither network is designed originally to carry and support Internet data.

Although some ISP might be able to support BGP over DSL, the DSL technology used is most likely SDSL instead of ADSL. Still, BGP over DSL is uncommon.

On the other hand, T1/E1 circuit is originally designed to carry and support Internet data; including the BGP support. As a note, T1/E1 circuit falls under the similar "real" data network as other "larger bandwidth" circuit technologies such as DS3, OC-x, ATM, and Gigabit Ethernet.

That is the reason why most ISP requires you to have T1/E1 circuit or larger to them to be able to do BGP peering with them.

Each BGP relationship with each ISP run over different POP termination (Dispersed POP)

This is basically following the same concept of dispersed POP for Multilink (bonded) circuit concept. Note that BGP Multihoming is just a logical separation and redundancy, and does not necessarily mean physical separation and redundancy. You can't really have a full redundancy without having both physical and logical separation.

The need to have your own Public AS number

When ISP run BGP with different ISP, then each ISP needs its own Public AS number. This AS number is used to distinguish between one ISP network and other ISP network.

Since you are considered as ISP when running BGP to multiple ISP, then you are also required to have your own BGP AS number. When you don't have one yet, then one of the ISP can provide you one.

Keep in mind that you need to inform both ISP beforehand that you will run redundancy over multiple ISP. This is to ensure that all parties involved understand what required setup to implement. The key is to make sure that your would-be Public AS number will be recognized by all ISP as valid Internet-routable Public AS number (or in other words, the Public AS number will be seen by any ISP and the rest of the Internet users).

When you don't inform the ISP of your purpose and you are requesting AS number from one of the ISP, the ISP might provide you Private AS number or AS number that is only seen by single ISP and unknown to other ISP and the rest of the Internet users.

The need to have Public Subnet within the Public AS number

Along with your own Public AS number, there must be your own Public Subnet. This Public Subnet usually in the form of followings

* At least it is full Class C (/24 CIDR), i.e. 31.45.81.0/24
* The subnet must be routeable within the Internet, therefore it can't be within 10.0.0.0/8, 172.16.0.0/12, or 192.168.0.0/16 (can't be the Private Subnet)
* The Public Subnet is statically assigned to you and only you at all times (never change)

To communicate with the Internet using this Public Subnet, there are following considerations

* Each ISP must have direct access to the Subnet using dedicated circuit you have with each of them respectively
* To communicate with the Internet, the Subnet traffic must go through either ISP and no other
* Both ISP must have direct BGP peering with each other to ensure that one ISP can reach the Subnet indirectly via the other ISP to provide redundancy (the IP Address Reachability requirement)

Router Hardware Specification

When you are running BGP to your ISP, you need to have equipment that is capable of running BGP routing. In addition, the equipment needs to have certain amount of CPU power and of RAM (memory) availability.

For Cisco router, usually it is suggested to have at least Cisco 2821 model; although the "standard" is 7206 or 7600 series model. For Cisco Layer-3 switch, it is suggested to have at least Catalyst 4500 or 6500 series model. The memory suggestion is at least 512 MB.

Since running BGP to your ISP requires a lot of CPU power and memory space availability, it is suggested to have dedicated router or Layer-3 switch on each link. When you only have a single equipment to terminate both links, make sure that the equipment is powerful enough to take the load.

You might be able to run BGP using less powerful equipment or with less memory availability. However your equipment could be severly impaired, especially when your ISP is decided (without your knowing) to propagate full BGP table instead of partial or default gateway.

Ability to Understand BGP Routing

In networking, BGP routing is considered an advanced topic. In order to understand BGP, you need to understand the IGP routing such as static, RIP, and OSPF. In addition, these IGP routing is required to support iBGP or to provide load balance (load share) between two ISP.

BGP Peering even when you only have one ISP: BGP Singlehoming

Note that it is still feasible to run BGP peering even when you only have one ISP. Several considerations to have this setup are the followings.

* You have multiple T1/E1 or larger circuits across multiple geographical locations where all circuits terminate to the same ISP
* You need to have more independent routing path decisions (instead of a mere static route of default gateway), compared to basic bonded T1/E1 circuits
* "True" BGP Multihoming is not yet an option
* You only have subnet smaller than /24 to announce via BGP to your ISP AS domain

When you don't have yet your own AS number and you plan to request one from your ISP, confirm with your ISP if the AS number you receive is Private (only seen by your ISP and unknown to the rest of Internet users) or Public (recognized by any Internet users).

More info on BGP

»BGP baby
»BGP, Hardware, etc...
»[Info] BGP Design

Following is a BGP sample configuration.

»Cisco Forum FAQ »BGP Design

Conclusion

Site-to-site VPN to 3rd ISP might sound cost-effective compared to the traditional bonded multiple dedicated circuits, the two different SLA design, or the BGP peering with ISP design. However the cost-effective factor comes with a price since then there are more challenging connection reliability factors to consider. When you have a critical application that does not tolerate down link at any time, then it is suggested to have the bonded circuit option, the two-different-SLA option, or the BGP peering option. If you can tolerate the down Internet, then you can have the option of having site-to-site VPN to 3rd ISP.

List of ISP Provides Both Physical and Virtual Multilink For Small and Medium Businesses

USA

Perimeter

Headquarter
440 Wheelers Farms Road
Suite 202
Milford, CT 06460
Phone: 800.234.2175
Website: »www.perimeterusa.com

Refer to http://www.dslreports.com/faq/14086

About Me

Chat Box

Google Maps

Visitor Locations

Lunar Calendar

Link URL

Followers

English study

Others

Tham khảo Blogroll