Scaling your services with ZXTM Global Load Balancer

"It is impossible for ideas to compete in the marketplace if no forum for
their presentation is provided or available." Thomas Mann, 1896

Scaling your services with ZXTM Global Load Balancer

Contributed by: Zeus Technology, Inc.

Introduction

"The average multinational corporation loses more than 1 million hours of productivity because of applications failure. Depending on the industry, each hour of downtime can cost businesses £3 million or more1"

"Each hour of application downtime costs Fortune 1000 companies in excess of $300,000, according to nearly one-third of respondents at companies that track the business cost and impact2"

However you measure it, the cost of application downtime can be very high for many organizations. For organizations that provide applications and services over the Internet, the probability of downtime is even higher.

There are two commonly used techniques to minimize the chance of a failure causing downtime in network-based applications. These are Server Load Balancing and Global Server Load Balancing.

Server Load Balancing within a Datacenter

Techniques like server load balancing and clustering are often used within a datacenter to build clusters of fault-tolerant, scalable applications. These clusters are resilient to isolated failures - for example, a server machine developing a hardware fault - and they allow the administrator to add more capacity to his application when required.

However, a clustered, fault-tolerant application running in a single datacenter is still vulnerable to downtime:

The application may fail because of a single, critical point of failure such as a database or SAN, or it may fail because of administrator error.
The datacenter may be disrupted due to a catastrophic natural or man-made disaster - power failure because of rolling blackouts, maintenance errors or even terrorist attack.
The datacenter may become unavailable because of a denial-of-service attack mounted against a different service running in that datacenter, or because of a failure in its local internet connectivity.

Organizations who wish to protect against these risks often choose to deploy a Global Server Load Balancing solution which routes application traffic to multiple distinct datacenters and removes the single point of failure.

Global Server Load Balancing between Datacenters

Global Server Load Balancing (GSLB) systems manage how clients are connected to a datacenter, when a service is hosted in multiple distinct datacenters.

1 Yankee Group, April 2006, "Overcoming Applications Ignorance: New Services to Enable Agility"

2 mValent Market Survey - Challenges and Priorities for Fortune 1000 companies

In an Active-Passive configuration, one datacenter is nominated the active one for each service. The other datacenters are idle for that service. If the active datacenter becomes unavailable, one of the passive datacenters becomes active and all clients are directed to it.

In an Active-Active configuration, all datacenters are used and clients are load-balanced between them based on datacenter performance and proximity.

The primary purpose of a GSLB system is Business Continuity - to ensure that services are always available, even when one or more service locations (datacenters) becomes unavailable.

A second purpose of GSLB is Improve Customer Experience - to load-balance each user to the best datacenter from a choice of several. The choice can be based on datacenter performance and proximity, so that clients are directed to the datacenter that is closest and is performing the best. This way, the client gets the best possible level of service.

Who might use a Global Server Load Balancing solution?

A GSLB solution is relevant to any organization:

1. Who provides or depends on an internet-based service, such as a public-facing web site, or a network-based application for internal use.

2. Who cannot countenance service failure, whether this results in lost productivity, lost revenue or lost customers.

3. Who wishes to establish an advantageous SLA (service level agreement) with its users or customers, providing them with a superior and competitive level of service.

This white paper discusses the implementation details of a DNS-based Global Server Load Balancing solution, with particular reference to Zeusâ€™ ZXTM GLB product.

Examples

Disaster Recovery

A specialist music and book retailer turns over orders in excess of $10,000 per day. Any period where users could not access the online shop would result in significant loss of revenue and reputation.

The retailer hosts their primary website in a hosting facility in New York, and replicates all database transactions to a second backup website in Boston. During normal operation, users are directed to the New York website, but if that website becomes unavailable, a GSLB system directs all users to the backup site in Boston.

When a contractor severed a fiber optic cable in the New York hosting facility, the GSLB device detected that the site was no longer accessible and immediately started directing users to the backup site in Boston instead. Because the database was continually replicated, users were able to continue with their transactions and complete their purchases.

Providing high levels of service

A UK-based publishing company publishes several prestigious scientific journals. Universities and research institutions across the world pay a subscription to access the content of these journals electronically.

A disaster recovery solution is required because the paid subscribers will not tolerate downtime. In addition, many of the subscribers in the US, Far East and Australasia report that the website is slow, and it can take too long to download the PDF content they have paid for.

The publishing company establishes mirror sites in the US and Japan and uses a GSLB device to seamlessly direct each user to the site that is geographically closest to them. Download times for many customers drop by up to 75%.

Upselling services to Hosting Customers

An innovative ISP was seeking additional services they could provide to their hosting customers.

Using data replication to a server platform located in a different datacenter, the ISP was able to synchronize customersâ€™ web content between two locations. With a GSLB device, he was able to direct traffic for some customer sites to the City North datacenter, and other sites to the City South datacenter, and thus control and manage the bandwidth used by each datacenter.

The ISPâ€™s customerâ€™s SLA contracts contained exclusions for major datacenter failure caused by elements outside the ISPâ€™s control. For an additional fee, the ISP was able to upsell a premium hosting package that included a datacenter failover service to minimize the risk of a datacenter failure rendering a customerâ€™s site inaccessible.

How does Global Server Load Balancing work?

DNS-based Global Server Load Balancing

The majority of GSLB devices function by manipulating the DNS (Domain Name System) resolution process.

An application such as a web browser needs to locate a service on the intranet before it can use it. Services are published using a Domain Name, such as www.zeus.com.

Behind the scenes, the application uses a process called ‘DNS Resolutionâ€™ to find out the IP Address of the internet server that provides the service with the given domain name. The DNS system is very much like a global internet phone book - you may know an individual by their full name (for example, "Tim Berners Lee"), but you need to look up their phone number before you can get in touch with them.

Different servers in different locations will have different IP addresses. A GSLB device controls how domain names are resolved to IP addresses, and thus controls which datacenter clients are directed to.

Several users access http://www.zeus.com, but are directed to different datacenters:

When users in the US try to access www.zeus.com, they are directed to IP address 45.6.1.12
Users in other locations are directed to IP address 103.12.253.4

In order to effectively deploy a GSLB solution, you need a good understanding of how the DNS system functions. For background reading, you may find the Zeus publication "A Laymanâ€™s Guide to DNS" useful.

Other GSLB designs

Other techniques are sometimes used to load balance users across several globally-distributed datacenters.

Application Level Redirection

Some application protocols, such as HTTP, allow for ‘redirectionâ€™ messages. A user accesses www.zeus.com, but receives a redirect sending him to us.zeus.com, which resolves to just one of the datacenters.

This method is effective at controlling precisely which datacenter a user is sent to, but it does not cater for datacenter failure, and users may bookmark or distribute links to us.zeus.com, bypassing the load-balancing decision.

Generally, this method needs to be implemented by a DNS-based GSLB system to ensure that www.zeus.com is always available and a traffic management device to control how and when users are redirected.

Triangulation

With triangulation, incoming network traffic is distributed across one or more datacenters using round-robin DNS. When a datacenter receives a request, it determines whether it is best suited to respond to the request, or whether it should forward the request to a different datacenter.

With Layer 4 triangulation, the first datacenter forwards the request to the second datacenter, and the second responds directly to the remote client. The request and response data takes three hops across the network. Layer 4 triangulation may not be possible if one of the service providers deploys egress filtering to defeat connection source-address spoofing (a technique often used to prevent SPAM email).

With Layer 7 triangulation, the first datacenter forwards the request to the second, and the second datacenter replies back to the first. The first datacenter then relays the response back to the client. The requests and response data takes four hops over the network.

Triangulation can load-balance very compute intensive application requests, but it generally does not improve response time, it is bandwidth-intensive and it does not cater for primary datacenter failure.

BGP Routing Control

BGP (Border Gateway Protocol) is the core routing protocol of the Internet. By manipulating BGP routing tables, it is possible to move blocks of IP addresses from one physical network location to another in a very different location.

BGP routing control can be used by an ISP to provide large-scale failover, but it is too expensive and coarse to provide fine-grained load balancing control for an individual service.

Introducing ZXTM Global Load Balancer

ZXTM Global Load Balancer (ZXTM GLB) is a DNS-based global server load balancing system.

Typical deployment procedure

ZXTM GLB can be deployed in a step-by-step, low risk manner with minimal interference or disruption to existing infrastructure.

The ZXTM GLB devices work alongside the existing DNS infrastructure, taking the DNS responses and manipulating them to control where each remote user is directed to. The ZXTM GLB devices do not replace any existing DNS servers, and all DNS information is stored on the DNS servers as before.

Begin with Round-Robin DNS

For example, suppose that the service www.zeus.com is hosted in two different locations, with IP addresses 21.2.12.1 and 45.4.54.5. Without a GSLB device, the DNS server would normally be configured to return both of these IP addresses when queries about www.zeus.com. The IP addresses would be returned in a different order each time using a process called Round-Robin DNS, and clients would connect to one of the datacenters.

Add in ZXTM GLB

ZXTM GLB builds on this standard configuration by manipulating the round-robin DNS responses:

1. The end user makes a DNS request for www.zeus.com.

2. ZXTM GLB forwards the DNS request to the existing DNS server.

3. The DNS server responds with all IP addresses in a round-robin fashion.

4. ZXTM GLB chooses one IP address and masks out the others from the response.

The key load-balancing decision that ZXTM GLB performs is to decide which IP address(es) should be returned to each remote user. This decision directly controls which datacenter each remote user uses.

Just one change needs to be made to the DNS information so that clients make DNS lookups through the GLB device rather than directly to the DNS servers. This change can be made by altering the NS record for the domain, or by adding a CNAME. Please refer to the ZXTM GLB documentation for more information.

DNS TTLs

DNS information is commonly cached (remembered) by intermediaries across the network. This caching behavior is advantageous because it reduces the amount of DNS traffic, but can impede the operation of a DNS-based Global Server Load Balancing device.

An important element in a DNS response is the TTL (time to live) value. This value informs any intermediaries as to how long the DNS response can be cached for. ZXTM GLB can rewrite TTL values in the DNS responses it has managed, overwriting a long default value with a much shorter one. The effect of the change (increased DNS traffic) can be easily observed using the real-time visualization tools in ZXTM GLB, so you can chose a suitable value that balances traffic rates with responsive failover.

How does ZXTM GLB work in practice?

One or more ZXTM GLB devices are deployed in each datacenter. The ZXTM GLB devices monitor the performance and availability of their own datacenter, and broadcast that information to the other ZXTM GLB devices in the other datacenters.

This way, every ZXTM GLB device knows the availability and performance of every datacenter.

Active-Active load balancing configurations

Any ZXTM GLB device may receive a DNS request a service running in the datacenters. When the datacenters are running in active-active mode, the ZXTM GLB device chooses which datacenter the user should be directed to. This decision is based on three criteria:

Datacenter Availability: If a datacenter has failed, users are not directed there.

Datacenter Performance: Datacenters with better response times are preferred over slower, more overloaded datacenters.

Geographic Proximity: ZXTM GLB uses a comprehensive database that maps IP address to geographic location, and calculates the geographic distance between the end user and each datacenter.

The decision can be tuned so that it is based purely on load, purely on geographic location, or on a mixture of the two:

The benefits of an active-active load balancing mode are that they give better datacenter utilization, that users get the best possible level of service from the closest, best performing datacenter, and the configuration provides full failover in the event of a datacenter failure.

However, you may not wish to use an active-active configuration if the applications you are balancing cannot be run in multiple datacenters simultaneously - for example, because they depend on a single database or SAN that cannot be continuously replicated over multiple sites. In this case, an active-passive configuration is more appropriate.

Additionally, one side-effect of an active-active load balancing mode is that an end user may spontaneously be redirected from one datacenter to another when his client software makes a fresh DNS request. For example, the datacenter he is accessing may become overloaded and the load-balancing algorithm may assign him to a different datacenter.

If this behavior is undesirable, you can overcome it by several methods. You can use the fully deterministic ‘Geoâ€™ load-balancing method, or you can use Application-level redirection to detect userâ€™s sessions and forcibly direct him to a particular datacenter when required. Please consult the ‘Multi-site session persistence with ZXTM GLB and ZXTMâ€™ document for a full description of this technique.

Active-Passive load balancing configurations

When the datacenters are running in active-passive mode, the load balancing decision is much simpler. You first specify the order in which the datacenters should be used:

All users are directed to the first datacenter (Hudson in this case) so long as that datacenter is available.

If the first datacenter fails, all users are directed to the second datacenter (Cambridge); you can build arbitrarily long chains of datacenters for multiple levels of failover.

If the first datacenter recovers, you can specify how the service should fail back. If automatic failback is enabled, users will immediately be directed to the first datacenter again. If it is disabled, users continue to use the second datacenter until the administrator manually indicates that the first datacenter is ready to receive traffic again.

The benefit of this configuration is that it gives a very deterministic, controllable disaster recovery solution, ideally suited for complex, stateful applications.

Availability and Performance Checking

ZXTM GLB checks the performance and correct operation of the services in the local datacenter using a range of application monitors. These monitors can run simple tests like network pings, or complex tests like HTTP GETs to verify that returned pages match particular criteria.

Performance data can optionally be deduced from the response times from selected monitors, or it can be supplied separately using a standards-compliant SOAP interface. This performance data is used to weight how much each datacenter is used when the Load or Adaptive load balancing algorithm is selected.

ZXTM GLB can also run an external connectivity monitor to verify that its datacenter has connectivity to an upstream location on the Internet.

ZXTM GLB broadcasts the health and performance data to the other ZXTM GLB devices in the other datacenters. It deduces that other datacenters are available if it hears the health and performance information from the ZXTM GLBs in those datacenters. For this reason, organizations typically operate a pair of ZXTM GLB devices in each datacenter, thus removing a possible single-point-of-failure within each datacenter.

Conclusion

ZXTM GLB is a complete DNS-based Global Server Load Balancing solution that provides:

Business Continuity in the event of catastrophic datacenter failure

Improved Customer Experience by routing users to the closest, best performing datacenter

ZXTM GLB is very easy to deploy, with minimal infrastructure changes and very little operational risk.

The rich real-time visualization and reporting in ZXTM GLB gives a clear picture of the effectiveness of the Global Server Load Balancing configuration and the activity of your users globally at any time.

The Global Map view in ZXTM GLB shows real-time site activity. It is ideal for public display in a network operations center or boardroom!

For Further Information

To find out more about ZXTM Global Load Balancer or to arrange a demonstration or product evaluation, please visit http://www.zeus.com/products/zxtmglb/

The ZXTM KnowledgeHub is a key resource for developers and system administrators wishing to learn about ZXTM and Zeusâ€™ Traffic Management solutions. It is located at http://knowledgehub.zeus.com/

Editorial Policy: Nothing you read in The Business Forum Journal should ever be construed to be the opinion of, statements condoned by, or advice from, The Business Forum Institute, its staff, workers, officers, members, directors, sponsors or shareholders. We pass no opinion whatsoever on the content of what we publish, nor do we accept any responsibility for the claims, or any of the statements made, within anything published herein. We merely aim to provide an academic forum and an information sourcing vehicle for the benefit of the business and the academic communities of the Pacific States of America and the World. Therefore, readers must always determine for themselves where the statistics, comments, statements and advice that are published herein are gained from and act, or not act, upon such entirely and always at their own risk. We accept absolutely no liability whatsoever, nor take any responsibility for what anyone does, or does not do, based upon what is published herein, or information gained through the use of links to other web sites included herein. Please refer to our: legal disclaimer

The Business Forum
Beverly Hills, California, United States of America

Email: [email protected]
Graphics by DawsonDesign
Webmaster: bruceclay.com