Why use Load Balancing?
This is an article about setting up a load balancer, so if you’re here it’s probably because you want to set one up. So I’ll keep this section relatively brief. A load balancer is a piece of hardware that accepts incoming traffic and then passes it through to other servers on the backend, this allows you to scale your infrastructure easily with little downtime simply by adding more servers into it. The main benefit of load balancing is distributing the load across multiple servers to make the site run quickly, even when there are a large volume of concurrent connections. This article will be about setting up a software based solution using HA Proxy, this allows you to use a server as a load balancer useful if your in a virtual environment such as the Rackspace Cloud where you can’t have hardware load balancers.
This article assumes you have already set up your apache nodes, and are familiar with using SSH and Linux.
Installing HA Proxy
You will need Make and gcc to install HA-Proxy
Currently, the version of this writing is 1.48, just copy the link from their website.
$ wget http://haproxy.1wt.eu/download/1.4/src/haproxy-1.4.8.tar.gz
and extract it:
$ tar -xvf haproxy-1.4.8.tar.gz
Our current kernel is 2.6.xx So we’ll do
# make TARGET=linux26 # make install
This compiles and installs HA Proxy, next we’ll have to configure it.
Configuring HA Proxy
Now we make our configuration file:
# vim /etc/haproxy.conf
And here is a sample configuration for basic roundrobin.
global maxconn 4096 user haproxy group haproxy daemon pidfile /var/run/haproxy.pid defaults mode http retries 3 option redispatch contimeout 5000 clitimeout 50000 srvtimeout 50000 frontend http-in bind *:80 default_backend cluster1 backend cluster1 balance roundrobin option http-server-close option forwardfor server Server1 10.1.1.1:80 server Server2 10.1.1.2:80 server Server3 10.1.1.3:80
This will make all traffic hit the server, then each client will be directed to Server1, then Server2 then Server3, roundrobin style. HA-Proxy is very flexible, say you wanted to have certain servers serve certain websites.
global maxconn 4096 user haproxy group haproxy daemon pidfile /var/run/haproxy.pid defaults mode http retries 3 option redispatch contimeout 5000 clitimeout 50000 srvtimeout 50000 frontend http-in bind *:80 acl is_site_1 hdr_end(host) -i example.com acl is_site_2 hdr_end(host) -i domain.com use_backend cluster1 if is_site_1 use_backend cluster2 if is_site_2 default_backend cluster1 backend cluster1 balance roundrobin option httpclose option forwardfor server Server1 10.1.1.1:80 server Server2 10.1.1.2:80 server Server3 10.1.1.3:80 backend cluster2 balance roundrobin option http-server-close option forwardfor server Server4 10.1.1.4:80 server Server5 10.1.1.5:80 server Server6 10.1.1.6:80
With this setup, all requests for example.com will be served by server1,2 and 3 and all requests for domain.com will be servers with server 4,5 and 6.
You can also specify
hdr(host) -i subdomain.example.com
if you wanted to serve subdomains to certain servers. Read the documentation for all the options available (theres way too many to go over in a single article).
Running the Server, and some last configurations
Once you are satisfied with your configuration, you can run HA-Proxy with
$ haproxy -f /etc/haproxy.conf
Now for some ports, on the first server(load balancer), you’ll want to open port 80 to the world.
# iptables -I INPUT -p tcp --dport 80 -j ACCEPT
Now on your node servers that you have apache/etc installed on, you’ll want to make it so they can’t be accessed directly and have to go through the load balancer.
# iptables -I INPUT -p tcp -s 10.24.56.78 --dport 80 -j ACCEPT # iptables -A INPUT -j REJECT --reject-with icmp-host-prohibited
Replace the 10.24.56.78 with the internal IP address of the load balancer, this will make it so only requests from the load balancer are responded to, and from other sources it’s rejected. This makes it so they can’t be accessed directly. The problem you’ll face now, is that every node must be exactly the same or different clients will get different content. The easiest thing to do will be using a clustered file system such as GlusterFS. We’ll get into that topic in the next article.