Creating a Caching Proxy Server with Apache
Published
It’s easy to set up Apache as a reverse proxy server that caches content. Below is a guide detailing how this can be done on an Ubuntu 16.04 host. Following these steps, I was able to stand up a quick caching layer in front of an app server, greatly reducing its load by caching web mapping tiles that were expensive to generate.
1. Enabling the Apache Modules
First enable the required modules in Apache:
sudo a2enmod cache
sudo a2enmod cache_disk
sudo a2enmod headers
sudo a2enmod expires
sudo a2enmod proxy
sudo a2enmod proxy_http
sudo a2enmod ssl # if reverse proxying to https servers
If any of these modules are not available, try installing the apache2-utils package.
2. Caching Behavior
Enable Apache’s caching module by specifying a caching provider and the URL path to be cached. The directives below instruct Apache to cache all requests to disk in the CacheRoot directory.
CacheEnable disk /
CacheRoot /var/cache/apache2/mod_cache_disk/routing
Next are several common configuration items such as disabling the CacheQuickHandler, specifying a CacheLock, and enabling a header that marks requests as HITs or MISSes. For more information, please see the mod_cache manual (which is actually incredibly helpful for a man page).
CacheQuickHandler off
CacheLock on
CacheLockPath /tmp/mod_cache-lock
CacheLockMaxAge 5
CacheHeader On
Cache Control Headers
While there are many ways to do it, this example uses Apache’s mod_expires to generate and set the “Cache-Control” and “Expires” headers for proxyied requests1.
ExpiresActive On
ExpiresByType text/html "access plus 1 years"
ExpiresByType image/png "access plus 1 years"
ExpiresByType application/javascript "access plus 1 years"
Ignoring Upstream Server Caching Headers
Sometimes the upstream server will have a caching strategy of their own. Apache will not cache objects that already have “Cache-Control” or “Expires” headers ste by the upstream server. To ensure these objects are cached anyways, ignore the upstream caching headers with:
Header unset Expires
Header unset Cache-Control
Header unset Pragma
Apache mod_cache will also pay attention to “Cache-Control” headers supplied by requests. These are used by browsers to force a content refresh (e.g. when developer tools are open). To ignore “Cache-Control” headers on requests, add:
CacheIgnoreCacheControl On
Disabling Caching for certain Paths
Caching may not be desirable for dynamic resources like a REST API, or for content living under certain paths (e.g. an admin portal). To disallow caching by path, use a directive like:
<LocationMatch "^/api/v1/$">
CacheDisable on
</LocationMatch>
3. The Reverse Proxy
Setting up the reverse proxy is simple, and requires adding 3 lines to the Apache config. These lines specify which requests will be proxied, and what server the requests will be proxied to.
ProxyRequests Off # used for forward proxying
ProxyPass / http://upstream.server.net/
ProxyPassReverse / http://upstream.server.net/
4. Bringing it all together
The complete Apache vhost file is shown below; each area is optional, and can be configured to more closely match the Apache proxy’s true operating environment.
<VirtualHost *:80>
ServerName routing.gritto.net
# enable caching for all requests; cache content on local disk
CacheEnable disk /
CacheRoot /var/cache/apache2/mod_cache_disk/routing
# common caching directives
CacheQuickHandler off
CacheLock on
CacheLockPath /tmp/mod_cache-lock
CacheLockMaxAge 5
CacheHeader On
# cache control
CacheIgnoreNoLastMod On
CacheIgnoreCacheControl On
# unset headers from upstream server
Header unset Expires
Header unset Cache-Control
Header unset Pragma
# set expiration headers for static content
ExpiresActive On
ExpiresByType text/html "access plus 1 years"
ExpiresByType image/png "access plus 1 years"
ExpiresByType application/javascript "access plus 1 years"
# do not cache requests to the REST API
<LocationMatch "^/api/v1/$">
CacheDisable on
</LocationMatch>
# reverse proxy requests to upstream server
ProxyRequests Off # used for forward proxying
SSLProxyEngine On # required if proxying to https
ProxyPass / https://upstream.server.net/
ProxyPassReverse / https://upstream.server.net/
</VirtualHost>
5. Confirming the Caching Rules
With the caching server in place, inspect the headers on a request as it is returned to the browser; seeing an X-Cache header with a value of “HIT” means the content was returned from cache. A “MISS” means the content had to be retrieved from the upstream server.
The first request is almost always a “MISS”, and is what triggers the cache to populate. Also remember that the browser may be sending a “Cache-Control: no-cache” header. If this is the case, make sure CacheIgnoreCacheControl directive discussed above is enabled.
Comments
No responses yet