In this article I will show you how to build an open-source, Linux (CentOS) based load balancer using the Direct Server Return (DSR) method of load balancing with the Linux Virtual Server (LVS) package and the Piranha web GUI. Let me first say that I have a deep knowledge of the Windows OS but my Linux knowledge is, well, about as deep as a kiddie pool after a day of splashing by my kids! So, for those of you who are Linux gurus, please feel free to provide comments, suggestions, or just laugh! In any event, my goal in building this configuration is to give you a possible solution for your labs, or if you have a solid Linux background and operations team, possibly even offer a production solution for Exchange deployments. The other main reason for me researching this solution is that all too many times I see customers looking at Windows Network Load Balancing (NLB) as an HA solution for CAS. If you're considering Windows NLB I would highly recommend that you don't. There are a number of fellow Exchange experts that have outlined the limitations so here's a quick technical recap of the downsides of Windows NLB in an Exchange 2010 deployment.
- You cannot run Windows NLB on a DAG member. See: http://support.microsoft.com/kb/954420 This means that the number of Exchange servers in your design will increase since you'll need to break out CAS to its own OS. Let's quickly do the math. This means you just added two or more additional Windows, Exchange, Forefront/AV, etc. licenses. You could have put that money toward a hardware/virtual load balancer and combined the Exchange roles which will reduce your operational TCO and give you a better chance of maintaining your messaging SLA's.
- Windows NLB is only host aware and not application layer aware. This means, for example, that if IIS crashes but the OS doesn't, Windows NLB may continue to direct OWA, EWS, etc. traffic to the failed CAS.
- Windows NLB is not a way to make friends with your network team. In most instances, it will port flood your switches. So, in order to get around this you'll need to implement one of the kludge fixes like this which creates complexity for your network team.
- Windows NLB does a poor job of persistency and is based on source IP only. Although I'm not 100% sure, I suspect Windows NLB is using a hash table of the source IP network rather than using a true /32 bit mask table in memory. In short, this means that if your clients all come from the same subnet, it can end up directing an uneven load to one CAS.
- There are known issues using Windows NLB with virtual machines that require special configuration.
Therefore, my first recommendation is to invest in a commercial hardware load balancer. There are a number of partners that are now part of the Microsoft UC Load Balancer Interoperability Qualification program. In addition to the vendors in this program I've personally had great success with F5 ($$$), Radware ($$), A10($$), Foundry ServerIron (Brocade), and both the hardware versions and the Virtual Machine flavors of the Kemp Technologies Load Masters ($). If your load allows (the SMB), I would definitely test the Kemp Virtual Load Balancer.
With that said, hopefully this article will help provide you with a better understanding around how load balancers work and may even offer a very low cost open-source option to your Exchange deployment or labs. So let's get into the weeds!
Why DSR? DSR or Direct Service Return or Direct Routing takes the least amount of resources to load balance traffic to your real servers. In short it works like this:
- The client sends the TCP packet to the Virtual IP (VIP) of the load balancer e.g. 10.10.11.19.
- When the ARP request is made by the client or local router, the load balancer responds with its MAC address e.g. AB-CD-EF-G1-23-45
- The load balancer then takes the packet and 'flips' only the MAC address to be the MAC of one of the real (CAS) servers. E.g. 12-34-56-78-90-AB
- The real CAS server then accepts the packet and responds directly to the client. i.e. the packet does not route back through the load balancer.
According to one vendor's documents this typically will run about 8 times faster than using a NAT load balancing method for HTTP traffic, 50 times faster for terminal services, and even faster for streaming media or FTP. The other advantage is that DSR provides client transparency to the real server. This means the real server (CAS) will log and see the source IP address of the actual Outlook or browser client in its logs where using NAT will result in the source IP always appearing as the VIP of the load balancer.
The downside in this method is that it runs at Layer 4 of the OSI model which means that you're limited to source IP as the method to provide affinity, or persistency between the client and the real server (see http://technet.microsoft.com/en-us/library/ff625247.aspx#affinity for more information). However, keeping to the scope of this article, if you were considering Windows NLB, you probably don't have the load that would warrant the need for cookie based affinity which is only available using a Layer 7 hardware proxy.
The other downside of using DSR is that since it only flips the MAC address of the packet i.e. the destination IP address remains the VIP of the load balancer, the real server (CAS) will not accept the packet. Therefore, we need a way for Windows to accept this packet with an IP address that isn't its own. To address this issue we need to perform extra configuration on CAS by installing the Microsoft Loopback adapter and assign the IP address of the Load Balancer's VIP. We also need to make sure that CAS does not respond to ARP requests of the VIP – that should only be done by the actual Load Balancer.
The Lab Layout:
Configuring our Windows Servers.
- Prepare our Exchange CAS servers by configuring the Microsoft Loopback Adapter. Open Device Manager, right click on server name, install legacy device.
Rename the loopback adapter to 'loopback' and the 'regular' interface to 'net'.
- Configure the loopback adapter. Unbind all protocols/services except for IPv4
- Configure the IP of the loopback adapter with the VIP of the Load balancer and no default gateway.
- Click Advanced and uncheck automatic metric. Set the metric to 254 which stops the interface from responding to arp requests.
- On the DNS tab, uncheck 'Register this connection's address in DNS'
Now disable weak host receiving (note the interface names must match -- net and loopback)
- netsh interface ipv4 set interface "net" weakhostreceive=enabled
- netsh interface ipv4 set interface "Loopback" weakhostreceive=enabled
- netsh interface ipv4 set interface "Loopback" weakhostsend=enabled
Building Our Load Balancer
- Download the Centos 5.5 ISO files from here: http://isoredirect.centos.org/centos/5/isos/x86_64/
Create a VM in Hyper-V with the following specs:
- 512MB RAM
- Remove all network adapters (Synthetic and Legacy)
- Remove the SCSI adapter
- 10-20GB VHD
- Boot the VM and when prompted for the install type, select a "linux text" install type.
- I skip the media test but you can run it if you choose.
- Follow the prompts, selecting the languages, keyboard, etc.
- When prompted select yes to initialize the virtual HD.
- Select OK on the partition type (leave default options) unless you really want to customize these.
- Continue through the partition and format warnings, select your time zone, and enter a root password.
Since we want to run a slim deployment of Centos, we'll want to remove all of the GUI options so at the package selection, remove the following:
- Desktop – Gnome (all others should be disabled by default)
- SELECT the Customize software selection
- Click ok
- Select the Base, Development Tools and Editors, Uncheck Dialup Networking Support and Text based Internet. The critical item here is the addition of the Development Tools which is required for the Hyper-V Linux Integration components.
- The install will begin – make sure you have all of the CD or DVD iso required for the install.
- While the install is running, download the Hyper-V Linux Integration Services here: http://www.microsoft.com/downloads/en/details.aspx?displaylang=en&FamilyID=eee39325-898b-4522-9b4c-f4b5b9b64551
- After Centos installs and reboots, the setup agent will run. Run the firewall configuration tool.
- Set the Security Level to Disabled and SELinux to Disabled.
- Although this is not critical, I run the System Services Tool and disable the Bluetooth, CUPS (Printing Services) and Sendmail services.
- Quit and you will be in the console shell. Insert the Hyper-V Integration Services ISO into the VM.
Enter the following:
- mkdir /media/cdrom
- mount /dev/cdrom /media/cdrom
- mkdir /opt/hypervsvc
- cp –R /media/cdrom/* /opt/hypervsvc
- cd /opt/hypervsvc
- make install
- shutdown now
- Once the server has shutdown, power it off and add a Synthetic Network adapter to the VM, select "Enable spoofing of MAC addresses", and then power it back up.
NOTE: I use VLANs in my lab so the use of VLANs, or VLAN 11 in the screen grab below is not critical and will depend on your environment.
- When Centos boots, log in as root and confirm that a new Ethernet interface is active and, assuming you're running DHCP on your network, you have an IP address.
- Now let's give our synthetic Ethernet interface a static IP by running system-config-network which will bring up the network setup assistant. Edit the seth0 interface and give it the static IP – in my lab it's 10.10.11.20/24
- Configure the DNS settings then save & quit.
Restart the network service by entering:
- service network restart
- confirm the ip binding by entering ifconfig
Next we'll install the Linux Virtual Server and Piranha (the Load Balancer GUI) packages. It is critical that you have Internet access at this time so the packages can be downloaded. So let's begin by entering:
- yum install ipvsadm
- yum install piranha
NOTE: You'll be asked to confirm the download of the packages. Type yes.
Now we'll configure the installed services to start automatically. Enter:
- chkconfig pulse on
- chkconfig piranha-gui on
- chkconfig httpd on
Next we'll configure a password for Piranha. Enter:
Now let's enable IP forwarding using the command:
- echo 1 > /proc/sys/net/ipv4/ip_forward
Now let's start the Piranha GUI. Note: during the setup agent we disabled the SELinux Enforcement. If that wasn't done you'll get an error when starting the service: Starting piranha-gui: (13)Permission denied: make_sock: could not bind to address [::]:3636
(13)Permission denied: make_sock: could not bind to address 0.0.0.0:3636
No listening sockets available, shutting down
Unable to open logs
So, let's configure using the commands:
- setenforce 0
- service httpd start
- service piranha-gui start
- Now let's open a browser and navigate to the Piranha GUI over port 3636. In my example, it is http://10.10.10.20:3636
- Now let's configure the load balancer. We'll going to use DSR so click on the Global Configuration tab and configure as follows:
- Next we'll create the VIP. Click on Virtual Servers then "Add". Then select the VIP with the radio button and click "Edit".
- Let's define our TCP 443 virtual server. Call this VIP "Exchange-443", set the application port to TCP 443, the VIP and subnet mask. It is important that you modify the interface to seth0:1 (synthetic Ethernet). Modify the service timeout to 30 seconds, use Round-Robin and a persistence setting of 4 hours – 14400 seconds, and a persistence network mask of 255.255.255.255 (an individual host).
- Now let's define our real servers. Click on the real server link, then click the "ADD" button twice (for 2 CAS servers in our case). Then select the first with the radio button and click edit.
- Enter the real IP address of the CAS1 server, click accept. Then repeat the process for the second CAS server.
- Now click on the Real Server link again and confirm that the real servers are configured properly.
- Next activate the real servers by clicking on each server's radio button then click the "(DE)ACTIVATE" button.
- Click on the Monitoring scripts link and then click the blank send and blank expect button then the Accept button. We'll start with a basic TCP Bind as our health check and then define a "smarter" method to monitor OWA later in this article. . Using Blank Send/Expect will just use a TCP bind to determine health.
- Now let's activate the VIP by clicking on the Virtual Servers tab, selecting our "Exchange-443" VIP and click the "(DE)ACTIVATE" button.
Now go back to the Centos shell and restart the pulse service. Note: If this is the first time you've configured the load balancer, the pulse service will show as failed when shutting down.
Use the command:
- service pulse restart
- Now let's check the status of the VIP. Go back to the Piranha web GUI and click on Control/Monitoring. If everything is working you'll see the VIP (10.10.11.19) bound to TCP 443 and routing to both of our real servers. If the real servers are not listed then the health check is failing.
- Let's test out the VIP. https://10.10.11.19/owa The cert error is because of the name mis-match but the VIP is working.
- Let's test a real server failure. In Hyper-V, I disconnected my CAS VM from the Virtual Network which will simulate it dropping off of the network. Wait for the health check period to pass and we'll see the real server drop off the routing table:
- Now that we've seen that the health checks work, re-enable the network settings on our CAS server to bring both online. Now let's finish configuring the rest of the TCP ports (virtual servers) that we'll need to load balance Exchange. In my Exchange configurations, I statically configure TCP port 7575 for the RPC Client Access Service and TCP 7576 for the Address Book Service. We'll also need TCP 135 for the RPC endpoint mapper and TCP 80 (for the http to https redirect). Configure the persistency timeout values for 135, 7575, 7576, and 80 to be 300 seconds but TCP 443 should be 4 hours to accommodate for private computer OWA sessions. When you've completed all the virtual servers tab should look like this
After the config changes are made, restart the pulse service using "service pulse restart". Then we'll check the monitoring status. The status should show all TCP ports routing and healthy:
Implementing a smarter health check for Outlook Web.
In our initial implementation of the VIP that load balances TCP 443 (OWA, ECP, etc) we only used a simple TCP bind to test if the server is healthy. However, what happens if there is a problem within an app pool or an underlying .NET issue? In that case IIS will most likely still respond to the TCP bind, but the application may not be available (OWA, ECP etc). So let's add some smarts to our healthcheck and look for something in the HTTP stream to check for. For this solution we'll use the wget command to open an HTTPS session and look for the string Outlook in the Forms Based Auth. You can enhance this test but the basic setup should give you the idea of health checking.
- Change directory to /usr/local/bin:
Now hold on Windows guys, we're going back to the ANSI BBS days!
Open up the Centos shell and run vi – the command line text editor using the following:
- Next type "I" (lower case i) and enter insert mode.
enter the following bash script:
LINES=`wget –q –O - --no-check-certificate https://$1:$2/owa`
if [[ $LINES == *utlook* ]]; then
- Once you have entered the text, hit the ESC key then ":x" to save the file.
- Next set the permissions on the file so it can be executed using the command:
chmod u+x lvs_check_owa
- Now test the script by calling it and passing the IP of one of our real servers and TCP 443:
lvs_check_owa 10.10.11.78 443
If all goes well we should have an "OK" returned.
- Now let's configure the load balancer to use the script. Open back up the Piranha web GUI, select the Exchange_443 VIP's radio button and click 'edit'.
- In the sending program field enter:
/usr/local/bin/lvs_check_owa %h %p
Then enter OK to the Expect field and click Accept.
- Now go back to the Centos shell and restart the pulse service: "service pulse restart"
- Click on the Control/Monitoring link and review that the 443 VIP is running and checking health by calling our lvs_check_https script.
Adding High Availability to the Load Balancing Solution
Built into LVS is the ability to create an HA Pair (Active/Passive). We'll leverage some features of virtualization to deployment this for this article. If, however, you were going to deploy this solution in a virtualized environment, you may simply want to rely on the integrated features of Live Migration, Vmotion, and/or VM clustering to provide HA for your LVS rather than actually deploying a LVS pair. If you wanted to deploy this on physical hardware then the process would be similar to what I've outlined in the VM environemtn for building two LVS servers. First, let's review how the lab environment will change with the addition of our HA pair. We'll add a second VLS which can be an owner for the floating VIP we already created.
- Let's first clone the Centos LVS Virtual Machine that we already built. Shut down linux using the command: 'poweroff' once the VM is powered off, export the virtual machine.
- Once the export has completed, rename the existing Virtual Machine – in our case we'll simply add "01" to the VM name.
- Next we'll import the exported VM. Check the "Copy the virtual machine (create a new unique ID)" and optionally check "Duplicate all files so the same virtual machine can be imported again"
- After our duplicated VM has imported, we'll rename the import – in my case I'll simply add "02" to the VM name.
- Next, power up Centos Load Balancer 02, log in as root, and enter the command 'system-config-network' and change the IP address. Per our lab diagram, we'll use 10.10.11.21
- Save and quit the config utility and then restart the network stack using the command 'service network restart' then confirm the IP config by entering 'ifconfig'.
- Now power up the Centos LVS 01 server and log in as root. Now open the Piranha GUI and click on the redundancy tab, enter the 02 server's IP address, check Monitor NIC links for failures, Syncdaemon and click the enable button.
- Now go back to the Centos shell of the 01 server and copy the config file to the 02 server by entering the command:
scp /etc/sysconfig/ha/lvs.cf email@example.com:/etc/sysconfig/ha/lvs.cf
If you are prompted to confirm the certificate thumbprint enter yes.
- Now restart the pulse service on the 01 server and then the 02 server using the command "service pulse restart".
NOTE: In my HyperV testing I receive the error message "IPVS: Error setting outbound mcast interface". I believe this is due to the HyperV network stack but I haven't confirmed that yet. In any event, the HA pair does fail over properly, as you'll see below.
- Now log back into the Piranha interface on both the 01 and 02 server. If the server is acting as the passive node the routing table will be empty.
If the routing table is present (like the lower screen grab) the router is the active node.
- Now that we know that our 02 node is the active node we'll start a running ping to the VIP from a client machine.
- Now let's power off the 02 node (the active node). Watch the running ping – 4 pings are lost (about 20 seconds) while the passive node lets the active node check time out.
- Now go back to the 01 server's Piranha GUI and click on the 'Control/Monitoring' link. You'll notice that the routing table is now active.
We've now built and tested our LVS HA pair!
I hope this article has provided some insight into how load balancers work, how to configure them with Exchange 2010, and also to provide you a possible solution in your Exchange lab and, if you have a good comfort level with Linux, possibly your production environment. However, I would highly recommend consulting a skilled Linux / LVS expert before even considering this as a production solution.