The thoughts of a SysAdmin

How Stack Exchange gets the most out of HAProxy

with 26 comments

At Stack Exchange we like to two two, well no three things. One, we love living on the bleeding edge and making use of awesome new features in software. Two, we love configuring the hell out of everything we run which leads to three – getting the absolute most performance out of the software that we run.

HAProxy is no exception to this. We have been running HAProxy since just about day one. We have been on the 1.5dev branch of code since almost the day it came out.

Of course most people would ask why you would do that? You open yourself up to a whole lot of issues with dev code. The answer of course is there are features that we want in this dev code. The less selfish answer is that we want to make the internet a better place for everyone. What better way to do that then running bleeding edge code and finding the issues for you?

I’m going to go through our HAProxy setup and how we are using the features. I would highly recommended reading through the HAProxy documentation for the version you are running. There is a ton of good information in there.


HAProxy Flow - ERD

This is a high level overview of what our network looks like from the cloud to the web front ends. Yes, there is a lot more to us serving you a request, but this is enough for this post.

The basics are that a request comes into our network from the internet. Once it passes our edge routers it goes on to our load balencers. These are CentOS 6 linux boxes running HAProxy 1.5dev. The request comes into our HAProxy load balencers and then depending on what tier that they come into are processed and sent to a backend. After the packet makes it’s way through HAProxy it gets routed to one of the web servers in our IIS farm.

One of the reasons that HAProxy is so damn good at what it does is that is it single minded, as well as (mostly) single threaded. This has lead it to scale very very well for us. One of the nice things about the software being single threaded is that we can buy a decent sized multi-core server and as things need more resources we just split them out to their own tier which is another HAProxy instance, using a different core.

Things get a bit more interesting with SSL as there is a multi-threaded bit to that to be able to handle the transaction overhead there. Going deeper into the how of the threading of HAProxy is out of the scope of this post though, so I’ll just leave it at that.

Phew, we’ve got the introductory stuff out of the way now. Let’s dive into what our HAProxy config actually looks like!

The first bit is our global defaults, and some setup – users, a bit of tuning, and some other odds and ends. All of these options are very well documented in the HAProxy docs, so I won’t bore you by explaining what each one of them do.

For this post all but one example (our websocket config) comes out of what we call “Tier 1” this is our main tier, it’s where we server the Q&A sites and other critical services out of.

userlist stats-auth
    group admin users <redacted>
    user supa_dupa_admin insecure-password <redacted>
    group readonly users <redacted>
    user cant_touch_this insecure-password <redacted>

    stats socket /var/run/haproxy-t1.stat level admin
    stats bind-process 1
    maxconn 100000
    pidfile /var/run/
    log local0
    log local0
    tune.bufsize 16384
    tune.maxrewrite 1024
    spread-checks 4
    nbproc 4

    errorfile 503 /etc/haproxy-shared/errors/503.http
    errorfile 502 /etc/haproxy-shared/errors/502.http
    mode http
    timeout connect 15s
    timeout client 60s
    timeout server 150s
    timeout queue 60s
    timeout http-request 15s
    timeout http-keep-alive 15s
    option redispatch
    option dontlognull
    balance source

Nothing all that crazy here, some tweaks for scale, setting up some users, timeouts, logging options and default balance mode. Generally you want to tune the


and your timeout values size to your environment and your application. Other than that the defaults should work for 98% of the people out there.

Now that we have our defaults setup, lets look a little deeper into the really interesting parts of our configuration. I will point out things that we use that are only available in 1.5dev as I go.

First, our SSL termination. We used to use Nginx for our SSL termination but as we grew our deployment of SSL. We knew that SSL support was coming to HAProxy, so we waited for it to come out then went in whole hog.

listen ssl-proxy-1
    bind-process 2 3 4
    bind ssl crt /etc/haproxy-shared/ssl/wc-san.pem
    bind ssl crt /etc/haproxy-shared/ssl/wc-san.pem
    bind ssl crt /etc/haproxy-shared/ssl/wc-san.pem
    bind ssl crt /etc/haproxy-shared/ssl/wc-san.pem
    bind ssl crt /etc/haproxy-shared/ssl/misc.pem
    bind ssl crt /etc/haproxy-shared/ssl/wc-san.pem
    bind ssl crt /etc/haproxy-shared/ssl/wc-san.pem
    bind ssl crt /etc/haproxy-shared/ssl/wc-san.pem
    bind ssl crt /etc/haproxy-shared/ssl/wc-san.pem
    bind ssl crt /etc/haproxy-shared/ssl/misc.pem
    mode tcp
    server http send-proxy
    server http send-proxy
    server http send-proxy
    server http send-proxy
    server http send-proxy

    maxconn 100000

This is a 1.5dev feature.

HAProxy - Core Detail - New Page

Once again, this is a pretty simple setup the gist of what is going on here, is that we setup a listener on port 443. It binds to the specified IP addresses as an SSL port using the specified certificate file in PEM format specifically the full chain including the private key. This is actually a very clean way to setup SSL since you just have one file to manage, and one config line to write when setting up an SSL endpoint.

The next thing it does is set the target server to itself (,2,3 etc) using the send-proxy directive which tell the proccess to use the proxy protocol so that we don’t lose some of that tasty information when the packet gets shipped to the plain http front end.

Now hold on a second! Why are you using multiple localhost proxy connections?! Ahh, good catch. Most people probably won’t run into this, but it’s because we are running out of source ports when we only use one proxy connection. We ran into something called source port exhaustion. The quick story is that you can only have ~65k ip:port to ip:port connections. This wasn’t an issue before we started using SSL since it we never got close to that limit.

What happened when we started using SSL? Well we started proxying a large amount of traffic via I mean we do have a feeewww more than 65k connections.

Total: 581558 (kernel 581926)
TCP:   677359 (estab 573996, closed 95478, orphaned 1237, synrecv 0, timewait 95475/0), ports 35043

So the solution here is to simply load balance between a bunch of ip’s in the space. Giving us ~65k more source ports per entry.

The final thing I want to point out about the SSL front end is that we use the bind-process directive to limit the cores that that particular front end is allowed to use. This allows us to have multiple HAProxy instances running and not have them stomp all over eachother in a multi-core machine.

Our HTTP Fronend

The real meat of our setup is our http frontend. I will go through this piece by piece and at the end of this section you can see the whole thing if you would like.

frontend http-in
    bind name stackexchange
    bind name careers
    bind name
    bind name openid
    bind name misc
    bind name stackexchange
    bind name careers
    bind name
    bind name openid
    bind name misc
    bind accept-proxy name http-in
    bind accept-proxy name http-in
    bind accept-proxy name http-in
    bind accept-proxy name http-in
    bind accept-proxy name http-in
    bind-process 1

Once again, this is just setting up our listeners, nothing all that special or interesting here. Here is where you will find the binding that our SSL front end sends to with the accept-proxy directive. Additionally, we give them a name so that they are easier to find in our monitoring solution.

stick-table type ip size 1000k expire $expire_time store gpc0,conn_rate($some_connection_rate)

## Example from HAProxy Documentation (not in our actual config)##
# Keep track of counters of up to 1 million IP addresses over 5 minutes
# and store a general purpose counter and the average connection rate
# computed over a sliding window of 30 seconds.
stick-table type ip size 1m expire 5m store gpc0,conn_rate(30s)

The first interesting piece is the stick-table line. What is going on here is we are capturing connection rate for the incoming IPs to this frontend and storing them into gpc0 (General Purpose Counter 0). The example from the HAProxy docs on stick-tables explains this pretty well.

    log global

    capture request header Referer               len 64
    capture request header User-Agent            len 128
    capture request header Host                  len 64
    capture request header X-Forwarded-For       len 64
    capture request header Accept-Encoding       len 64
    capture response header Content-Encoding     len 64
    capture response header X-Page-View          len 1
    capture response header X-Route-Name         len 64
    capture response header X-Account-Id         len 7
    capture response header X-Sql-Count          len 4
    capture response header X-Sql-Duration-Ms    len 7
    capture response header X-AspNet-Duration-Ms len 7
    capture response header X-Application-Id     len 5
    capture response header X-Request-Guid       len 36
    capture response header X-Redis-Count        len 4
    capture response header X-Redis-Duration-Ms  len 7
    capture response header X-Http-Count         len 4
    capture response header X-Http-Duration-Ms   len 7
    capture response header X-TE-Count           len 4
    capture response header X-TE-Duration-Ms     len 7

rspidel ^(X-Page-View|Server|X-Route-Name|X-Account-Id|X-Sql-Count|X-Sql-Duration-Ms|X-AspNet-Duration-Ms|X-Application-Id|X-Request-Guid|X-Redis-Count|X-Redis-Duration-Ms|X-Http-Count|X-Http-Duration-Ms|X-TE-Count|X-TE-Duration-Ms):

We are mostly doing some setup for logging here. What is happening, is that as a request comes in or leaves we capture some specific headers using capture response or capture request depending on where the request is coming from. HAProxy then takes those headers and inserts them into the syslog message that is sent to our logging solution. Once we have captured the headers that we want on the response we remove them using rspidel to strip them from the response sent to the client. rspidel uses a simple regex to find and remove the headers.

The next thing that we do is to setup some ACLs. I’ll just show a few examples here since we have quite a few.

acl source_is_serious_abuse src_conn_rate(http-in) gt $some_number
acl api_only_ips src -f /etc/haproxy-shared/api-only-ips
acl is_internal_api path_beg /api/
acl is_area51 hdr(host) -i
acl is_kindle hdr_sub(user-agent) Silk-Accelerated

I would say that the first ACL here is one of the more important ones we have. Remember that stick-table we setup earlier? Well this is where we use that. It adds your IP to the ACL source_is_serious_abuse if your IP’s connection rate in gpc0 is greater than $some_number. I will show you what we do with this shortly when I get to the routing in the config file.

The next few acl’s are just examples of different ways that you can setup acl’s in HAProxy. For example, we check to see if your user agent has ‘Silk-Accelerated’ in the UA. If it does we put you in the is_kindle acl.

Now that we have those acl’s setup, what exactaly do we use them for?

    tcp-request connection reject if source_is_serious_abuse !source_is_google !rate_limit_whitelist
    use_backend be_go-away if source_is_abuser !source_is_google !rate_limit_whitelist

The first thing we do is deal with those connections that make it onto our abuse ACLs. The first one just deny’s the connection if you are bad enough to hit our serious abuse ACL – unless you have been whitelisted or are google. The second one is a soft error that throws up a 503 error if you are just a normal abuser – once again unless you are google or whitelisted.

The next thing we do is some request routing. We send different requests to different server backends.

    use_backend be_so_crawler if is_so is_crawler
    use_backend be_so_crawler if is_so is_crawler_ua
    use_backend be_so if is_so
    use_backend be_stackauth if is_stackauth
    use_backend be_openid if is_openid

    default_backend be_others

What this is doing is matching against ACLs that where setup above, and sending you to the correct backend. If you don’t match any of the ACLs you get sent to our default backend.

An Example Backend

Phew! That’s a lot of information so far. We really do have a lot configured in our HAProxy instances. Now that we have our defaults, general options, and front ends configured what does one of our backends look like?

Well they are pretty simple beasts. Most of the work is done on the front end.

backend be_others
    mode http
    bind-process 1
    stick-table type ip size 1000k expire 2m store conn_rate($some_time_value)
    acl rate_limit_whitelist src -f /etc/haproxy-shared/whitelist-ips
    tcp-request content track-sc2 src
    acl conn_rate_abuse sc2_conn_rate gt $some_value
    acl mark_as_abuser sc1_inc_gpc0 gt $some_value
    tcp-request content reject if conn_rate_abuse !rate_limit_whitelist mark_as_abuser

    stats enable
    acl AUTH http_auth(stats-auth)
    acl AUTH_ADMIN http_auth_group(stats-auth) $some_user
    stats http-request auth unless AUTH
    stats admin if AUTH_ADMIN
    stats uri /my_stats
    stats refresh 30s

    option httpchk HEAD / HTTP/1.1\r\nUser-Agent:HAProxy\r\

    server ny-web01 check
    server ny-web02 check
    server ny-web03 check
    server ny-web04 check
    server ny-web05 check
    server ny-web06 check
    server ny-web07 check
    server ny-web08 check
    server ny-web09 check

There really isn’t too much to our back ends. We setup some administrative auth at the beginning. The next thing we do is, I think the most important part. We specify with the option httpchk where we want to connect when doing a check on the host to see if it’s up.

In this instance we are just checking ‘/’ but a lot of our back ends have a ‘/ping’ route that gives more information about how the app is performing for our out monitoring solutions. To check those routes we simply change ‘HEAD /’ to ‘HEAD /ping’

Final Words

Man, that we sure a lot of information to write, and process. But using this setup has giving us a very stable, scalable and flexible load balancing solution. We are quite happy with the way that this is all setup, and has been running smoothly for us.

Update 9/21/14: For those curious you can look at the full, sanitized tier one config we use.

Written by George Beech

March 25th, 2014 at 4:39 pm

  • Joseph Daigle

    This is great content. We’re investigating a switch from a pair of proprietary LB appliances to HAProxy. What’s the upside and downside of using CentOS instead of Ubuntu Server? What do you use for high availability and failover between the load balancers?

    • We actually did run our LB’s on Ubuntu log ago. We had issues with random Kernel Panics that we couldn’t solve, and had planned to switch our infrastructure to CentOS for other reasons – most notable much better vendor support due to it being RHEL compatible.

      The kernel panics where on I think 12.04LTS, So it may be resolved in newer versions. Personally I believe CentOS is a better distro in the datacenter than Ubuntu, but go with what you are comfortable with.

      We use keepalived for failover. Pretty simple and works nicely for us.

      • justin

        What about more than just 2x HAProxy servers? How would you scale a farm of them?

        • We actually run all SE properties on just one active load balencer right now. Our current solution is to split the IP range into two /25’s and attached each to a load balencer then summarize that back into a /24 for announcement.

          That solution doesn’t scale too well however. Right now we are knocking around the idea of using iBGP to let us horizontally scale the load balencers out. We haven’t hit the limits of a single A/S fail over pair yet so it’s low on the list. Something similar to what Shutterstock did: although just on the LB router network not direct to the web tier.

  • Pingback: Linux (Ubuntu) Tutorial for Windows Developers & Power Users - Part I | FullStack - Ofer Zelig's Blog()

  • Hi George,

    You article is very well detailed, I’ll be able to use it as an example to reply to users, thanks !

    I’m just realizing that we recently introduced support for unix connections to servers in order to get rid of the local source port exhaustion issue. And even better, a linux-specific extension called “abstract namespace sockets” is supported as well, it does not even require file-system access. You may want to give that a try instead of having to bind many 127 addresses. Another benefit is that the connection set up and tear down is cheaper CPU-wise. The syntax is :

    server foo [email protected] send-proxy …

    and for the listener :

    listen ssl-proxy-l
    bind [email protected] accept-proxy …

    • Thanks Willy,
      We have deployed this style change and it’s indeed a much cleaner config, we’re using it on websockets now with.

      Is there any chance we’ll see something like this for external connections? Currently, for websockets, we’re binding to the same web servers multiple times, one port per server line in order to get around the exhaustion issue.

      Preferably, we’d want to vary the range on the linux side and have just bind to 1 port on the destination server. Assuming that’s not in the cards, is there any chance there will be a way around this on the linux side, or the ability to specify a range cleanly in the server syntax? Something like this:

      server ny-web01 check

      Currently we have this:

      server ny-web01-1 check
      server ny-web01-2 check
      server ny-web01-3 check

  • carlsberg

    there’s some very interesting things here, your setup for ssl is different do how I’ve thought it should be done. and I’ve not done anything with acl’s and stick tables, but I would really like to now.
    you mention using different haproxy instances, but then have bind 2 3 4 in the ssl listener, is this not binding those requests to the other cores? if so, why the need for other instances?
    i’d like to see more of exactly what you’ve done with the acl’s, backend’s and stick-tables. is there any chance of making your full haproxy config files available? obviously with the real ip’s altered.

    • The use of multiple instances is very much a “We are running a ton of connections through one machine” solution. We currently have ~850,000 connections in various states on our prod load balencer right now. That is a large number of connections, a number most people don’t have to deal with.

      The instances allow us to spread the load, and have granularity. We can take down one instance without taking everything down. Also the HAProxy http process is single threaded. Spinning up multiples of them allows you to more fully utilize the cores you have in your machine.

      I can make the full configs available (minus our rate limiting config, passwords and IP addresses). It probably won’t be until Monday since I’m on vacation today.

      • carlsberg

        any progress with the config files? i’d really like to have a look through them, I’ve been trying to look at the stick-table stuff in the manuals when I get the time, but I prefer looking at working config files, makes it easier to see how everything ties in together, and there’s definitely some stuff here I’d like to try out on my test servers.

        • Apologies, It got tied up after my vacation. I’m just having someone double check that passwords / rate limits are removed and I should be able to post it up shortly.

        • I’ve updated the post with a link to the full config (last sentence of the post)

  • Jim G


    It looks like you only have one frontend in your config. Are you using ACLs on the name field to determine which backend serves the requests? Maybe I’m missing something. Thanks!

    • Hi,

      This is just one of our HAProxy configs. We run multiple HAProxy processes on a single load balancer. I simplified things down a bit for the article since it was quite long already.

      Basically we have two ways that we delineate which back end things god to.

      1. We have multiple tiers, that may or may not server multiple back ends. Each is a separate HAProxy instance. On the simplest tiers it’s just one front end feeding directly to one back end.

      2. On the more complicated tiers we use header matching, and other acls to decide which back end the request will go to.

      It’s a little burried in the “our front end section” but simply we would do something like this:

      acl is_area51 hdr(host) -i

      and pair that with a back end declaration like:

      use_backend be_a51 if is_area_51

      • Jim G

        Awesome. Thanks so much for posting this article!

    • Jim G

      Nevermind, I think I figured out the ACLs. Could you confirm that you use one frontend for everything though as opposed to one for each site?

  • carlsberg

    just been testing some things out, the configs have been very helpful.
    just have a quick question about the SSL listener, the full config for that’s in a different configuration file, do you also include the stick-table, all the ACL’s and the header capture’s in that listener as well? or do you just leave that to be done when it goes through the http-in frontend?
    also, regarding kernel panics on Ubuntu 12.04 mentioned in another response, I’ve been testing this with Ubuntu 14.04, and haven’t had a single crash yet, that said, I’ve not hit your levels of traffic, testing using siege up to 100000 users and just a single instance of haproxy

    • No, since all HTTPS connections go through the HTTP front end eventually we don’t need to do double to work.

      We didn’t start seeing the crashes until the box was under load for about a month or so serving millions of people a month. We where planning on moving to CentOS for other reasons anyway (Much better support on RHEL based distros from Dell at that time for example). The kernel Panics just sealed the deal.

  • Paweł Zięba

    very interesting post indeed.

    I don’t understand one thing, however. What do you mean by tier here? is is software or hardware component.

    • Sorry for the long reply, I didn’t see this comment.

      Each tier is a grouping of like services. For example, all public facing Q&A websites are one tier, Blogs are another tier, and web sockets yet another. Each tier is a separate HAProxy process.

  • Warren Turkal

    Thanks for the info. My understanding of HAProxy is that the different processes don’t communicate with one another. As such. how do you get stats for all the processes and not only process 1?

    • We use Opserver ( ) to view stats for all processes.

      We also setup each tier (process) to have a unique stats endpoint – both socket, and http. With each of them being on a unique IP/Port combo we can use our tooling to show us what is going on for each.

      • Warren Turkal

        I see the following config


        stats socket /var/run/haproxy-t1.stat level admin
        stats bind-process 1

        This looks like the stats socket is only available on process 1. What I guess I am not seeing is how the stats socket could provide info about processes 2-4.

        It looks like 2-4 are only being used for TLS termination and that there is no stats socket on them. Is that a correct understanding?

        • Ah, I see what you are asking.

          Correct, 2-4 are only used to TSL termination. with no stats sockets, since they always flow back into the associated http front-end (process 1) and we get stats there.

          I honestly would have to go back and look at the documentation to see what kind of stats are available if you wanted to bind to the ssl processes.

          • Warren Turkal


  • Pingback: Haproxy bei Stackexchange | the world needs more puppet!()