Creating a Caching Proxy Server with Apache

Published

It’s easy to set up Apache as a reverse proxy server that caches content. Below is a guide detailing how this can be done on an Ubuntu 16.04 host. Following these steps, I was able to stand up a quick caching layer in front of an app server, greatly reducing its load by caching web mapping tiles that were expensive to generate.

1. Enabling the Apache Modules

First enable the required modules in Apache:

sudo a2enmod cache
sudo a2enmod cache_disk
sudo a2enmod headers
sudo a2enmod expires
sudo a2enmod proxy
sudo a2enmod proxy_http
sudo a2enmod ssl # if reverse proxying to https servers

If any of these modules are not available, try installing the apache2-utils package.

2. Caching Behavior

Enable Apache’s caching module by specifying a caching provider and the URL path to be cached. The directives below instruct Apache to cache all requests to disk in the CacheRoot directory.

CacheEnable disk /
CacheRoot /var/cache/apache2/mod_cache_disk/routing

Next are several common configuration items such as disabling the CacheQuickHandler, specifying a CacheLock, and enabling a header that marks requests as HITs or MISSes. For more information, please see the mod_cache manual (which is actually incredibly helpful for a man page).

CacheQuickHandler off
CacheLock on
CacheLockPath /tmp/mod_cache-lock
CacheLockMaxAge 5
CacheHeader On

Cache Control Headers

While there are many ways to do it, this example uses Apache’s mod_expires to generate and set the “Cache-Control” and “Expires” headers for proxyied requests1.

ExpiresActive On
ExpiresByType text/html "access plus 1 years"
ExpiresByType image/png "access plus 1 years"
ExpiresByType application/javascript "access plus 1 years"

Ignoring Upstream Server Caching Headers

Sometimes the upstream server will have a caching strategy of their own. Apache will not cache objects that already have “Cache-Control” or “Expires” headers ste by the upstream server. To ensure these objects are cached anyways, ignore the upstream caching headers with:

Header unset Expires
Header unset Cache-Control
Header unset Pragma

Apache mod_cache will also pay attention to “Cache-Control” headers supplied by requests. These are used by browsers to force a content refresh (e.g. when developer tools are open). To ignore “Cache-Control” headers on requests, add:

CacheIgnoreCacheControl On

Disabling Caching for certain Paths

Caching may not be desirable for dynamic resources like a REST API, or for content living under certain paths (e.g. an admin portal). To disallow caching by path, use a directive like:

<LocationMatch "^/api/v1/$">
    CacheDisable on
</LocationMatch>

3. The Reverse Proxy

Setting up the reverse proxy is simple, and requires adding 3 lines to the Apache config. These lines specify which requests will be proxied, and what server the requests will be proxied to.

ProxyRequests Off # used for forward proxying
ProxyPass / http://upstream.server.net/
ProxyPassReverse / http://upstream.server.net/

4. Bringing it all together

The complete Apache vhost file is shown below; each area is optional, and can be configured to more closely match the Apache proxy’s true operating environment.

<VirtualHost *:80>
    ServerName routing.gritto.net
     
    # enable caching for all requests; cache content on local disk
    CacheEnable disk /
    CacheRoot /var/cache/apache2/mod_cache_disk/routing

    # common caching directives
    CacheQuickHandler off
    CacheLock on
    CacheLockPath /tmp/mod_cache-lock
    CacheLockMaxAge 5
    CacheHeader On

    # cache control
    CacheIgnoreNoLastMod On
    CacheIgnoreCacheControl On
    
    # unset headers from upstream server
    Header unset Expires
    Header unset Cache-Control
    Header unset Pragma
   
    # set expiration headers for static content
    ExpiresActive On
    ExpiresByType text/html "access plus 1 years"
    ExpiresByType image/png "access plus 1 years"
    ExpiresByType application/javascript "access plus 1 years"
    
    # do not cache requests to the REST API
    <LocationMatch "^/api/v1/$">
        CacheDisable on
    </LocationMatch>  
    
    # reverse proxy requests to upstream server
    ProxyRequests Off # used for forward proxying
    SSLProxyEngine On # required if proxying to https
    ProxyPass / https://upstream.server.net/
    ProxyPassReverse / https://upstream.server.net/
    
</VirtualHost>

5. Confirming the Caching Rules

With the caching server in place, inspect the headers on a request as it is returned to the browser; seeing an X-Cache header with a value of “HIT” means the content was returned from cache. A “MISS” means the content had to be retrieved from the upstream server.

The first request is almost always a “MISS”, and is what triggers the cache to populate. Also remember that the browser may be sending a “Cache-Control: no-cache” header. If this is the case, make sure CacheIgnoreCacheControl directive discussed above is enabled.

References

  1. mod_expires – Apache HTTP Server Version 2.4

Subscribe by Email

Enter your email address below to be notified about updates and new posts.


Comments

Loading comments..

No responses yet

Leave a Reply

Your email address will not be published. Required fields are marked *