Creating a Caching Proxy Server with Apache

Published

It’s easy to set up Apache as a reverse proxy server that caches content. Below is a guide detailing how this can be done on an Ubuntu 16.04 host. Following these steps, I was able to stand up a quick caching layer in front of an app server, greatly reducing its load by caching web mapping tiles that were expensive to generate.

1. Enabling the Apache Modules

First enable the required modules in Apache:

sudo a2enmode mod_cache
sudo a2enmode mod_cache_disk
sudo a2enmode mod_headers
sudo a2enmode mod_expires
sudo a2enmode mod_proxy
sudo a2enmode mod_proxy_http

If any of these modules are not available, try installing the apache2-utils package.

2. Configuring the Caching Behavior

Enable Apache’s caching module by specifying a caching provider and the URL path to be cached. The directives below instruct Apache to cache all requests to disk in the CacheRoot directory.

CacheEnable disk /
CacheRoot /var/cache/apache2/mod_cache_disk/routing

Next are several common configuration items such as disabling the CacheQuickHandler, specifying a CacheLock, and enabling a header that marks requests as HITs or MISSes. For more information, please see the mod_cache manual (which is actually incredibly helpful for a man page).

CacheQuickHandler off
CacheLock on
CacheLockPath /tmp/mod_cache-lock
CacheLockMaxAge 5
CacheHeader On

Cache Control

Apache’s mod_cache will cache content based on the “Cache-Control” and “Last-modified” headers supplied by the upstream server. Content without a “Last-modified” header will not be cached; to ignore this requirement, add:

CacheIgnoreNoLastMod On

In some situations, it may be desirable to cache the content regardless of if the upstream server marks it as stale with Cache—Control, Pragma, or Expires headers. To ignore these headers from upstream servers, add:

Header unset Expires
Header unset Cache-Control
Header unset Pragma

Apache mod_cache will also pay attention to “Cache-Control” headers supplied by requests. These are used by browsers to force a content refresh (e.g. when developer tools are open). To ignore “Cache-Control” headers on requests, add:

CacheIgnoreCacheControl On

Cache Expiration

Apache’s expires module makes it easy to specify a cache duration for each type of content. With the set of directives below, HTML documents, JavaScript, and PNGs will be cached for up to a year after their first retrieval.

ExpiresActive On
ExpiresByType text/html "access plus 1 years"
ExpiresByType image/png "access plus 1 years"
ExpiresByType application/javascript "access plus 1 years"

Disabling Caching for certain Paths

Caching may not be desirable for dynamic resources like a REST API, or for content living under certain paths (e.g. an admin portal). To disallow caching by path, use a directive like:

<LocationMatch "^/api/v1/$">
    CacheDisable on
</LocationMatch>

3. Configuring the Reverse Proxy

Setting up the reverse proxy is simple, and requires adding 3 lines to the Apache config. These lines specify which requests will be proxied, and what server the requests will be proxied to.

ProxyRequests On
ProxyPass / http://upstream.server.net/
ProxyPassReverse / http://upstream.server.net/

4. Bringing it all together

The complete Apache vhost file is shown below; each area is optional, and can be configured to more closely match the Apache proxy’s true operating environment.

<VirtualHost *:80>
    ServerName routing.gritto.net
    	
    # enable caching for all requests; cache content on local disk
    CacheEnable disk /
    CacheRoot /var/cache/apache2/mod_cache_disk/routing

    # common caching directives
  	CacheQuickHandler off
  	CacheLock on
  	CacheLockPath /tmp/mod_cache-lock
  	CacheLockMaxAge 5
  	CacheHeader On

    # cache control
  	CacheIgnoreNoLastMod On
  	CacheIgnoreCacheControl On
  	
  	# unset headers from upstream server
  	Header unset Expires
    Header unset Cache-Control
    Header unset Pragma
  	
  	# set expiration headers for static content
  	ExpiresActive On
  	ExpiresByType text/html "access plus 1 years"
  	ExpiresByType image/png "access plus 1 years"
  	ExpiresByType application/javascript "access plus 1 years"
  	
  	# do not cache requests to the REST API
  	<LocationMatch "^/api/v1/$">
  		  CacheDisable on
  	</LocationMatch>	
  	
    # reverse proxy requests to upstream server
  	ProxyRequests On
    ProxyPass / http://upstream.server.net/
  	ProxyPassReverse / http://upstream.server.net/
    
</VirtualHost>

5. Confirming the Caching Rules

With the caching server in place, inspect the headers on a request as it is returned to the browser; seeing an X-Cache header with a value of “HIT” means the content was returned from cache. A “MISS” means the content had to be retrieved from the upstream server.

The first request is almost always a “MISS”, and is what triggers the cache to populate. Also remember that the browser may be sending a “Cache-Control: no-cache” header. If this is the case, make sure CacheIgnoreCacheControl directive discussed above is enabled.

Screen capture showing the respond headers of of web request; the X-Cache header, with a value of "HIT" is high-lighted, showing that this request was returned from cache.