Why use Load Balancing?

This is an article about setting up a load balancer, so if you’re here it’s probably because you want to set one up. So I’ll keep this section relatively brief. A load balancer is a piece of hardware that accepts incoming traffic and then passes it through to other servers on the backend, this allows you to scale your infrastructure easily with little downtime simply by adding more servers into it. The main benefit of load balancing is distributing the load across multiple servers to make the site run quickly, even when there are a large volume of concurrent connections. This article will be about setting up a software based solution using HA Proxy, this allows you to use a server as a load balancer useful if your in a virtual environment such as the Rackspace Cloud where you can’t have hardware load balancers.

This article assumes you have already set up your apache nodes, and are familiar with using SSH and Linux.

Installing HA Proxy

You will need Make and gcc to install HA-Proxy

Currently, the version of this writing is 1.48, just copy the link from their website.

    $  wget http://haproxy.1wt.eu/download/1.4/src/haproxy-1.4.8.tar.gz

and extract it:

    $ tar -xvf haproxy-1.4.8.tar.gz

Our current kernel is 2.6.xx So we’ll do

    # make TARGET=linux26
    # make install

This compiles and installs HA Proxy, next we’ll have to configure it.

Configuring HA Proxy

Now we make our configuration file:

    # vim /etc/haproxy.conf

And here is a sample configuration for basic roundrobin.

global
     maxconn 4096
     user haproxy
     group haproxy
     daemon
     pidfile /var/run/haproxy.pid

defaults
     mode http
     retries 3
     option redispatch
     contimeout 5000
     clitimeout 50000
     srvtimeout 50000

frontend http-in
     bind *:80
     default_backend cluster1

backend cluster1
     balance roundrobin
     option http-server-close
     option forwardfor
     server Server1 10.1.1.1:80
     server Server2 10.1.1.2:80
     server Server3 10.1.1.3:80

This will make all traffic hit the server, then each client will be  directed to Server1, then Server2 then Server3, roundrobin style. HA-Proxy is very flexible, say you wanted to have certain servers serve certain websites.

global
     maxconn 4096
     user haproxy
     group haproxy
     daemon
     pidfile /var/run/haproxy.pid

defaults
     mode http
     retries 3
     option redispatch
     contimeout 5000
     clitimeout 50000
     srvtimeout 50000

frontend http-in
     bind *:80
     acl is_site_1 hdr_end(host) -i example.com
     acl is_site_2 hdr_end(host) -i domain.com

     use_backend cluster1 if is_site_1
     use_backend cluster2 if is_site_2

     default_backend cluster1

backend cluster1
     balance roundrobin
     option httpclose
     option forwardfor
     server Server1 10.1.1.1:80
     server Server2 10.1.1.2:80
     server Server3 10.1.1.3:80

backend cluster2
     balance roundrobin
     option http-server-close
     option forwardfor
     server Server4 10.1.1.4:80
     server Server5 10.1.1.5:80
     server Server6 10.1.1.6:80

With this setup, all requests for example.com will be served by server1,2 and 3 and all requests for domain.com will be servers with server 4,5 and 6.

You can also specify

hdr(host) -i subdomain.example.com

if you wanted to serve subdomains to certain servers. Read the documentation for all the options available (theres way too many to go over in a single article).

Running the Server, and some last configurations

Once you are satisfied with your configuration, you can run HA-Proxy with

    $   haproxy -f /etc/haproxy.conf


Now for some ports, on the first server(load balancer), you’ll want to open port 80 to the world.

#   iptables -I INPUT -p tcp --dport 80 -j ACCEPT

Now on your node servers that you have apache/etc installed on, you’ll want to make it so they can’t be accessed directly and have to go through the load balancer.

#  iptables -I INPUT -p tcp -s 10.24.56.78 --dport 80 -j ACCEPT
#  iptables -A INPUT -j REJECT --reject-with icmp-host-prohibited

Replace the 10.24.56.78 with the internal IP address of the load balancer, this will make it so only requests from the load balancer are responded to, and from other sources it’s rejected. This makes it so they can’t be accessed directly. The problem you’ll face now, is that every node must be exactly the same or different clients will get different content. The easiest thing to do will be using a clustered file system such as GlusterFS. We’ll get into that topic in the next article.

« »