Cache Headers and Enfold Proxy
Cache Headers and Enfold Proxy
Introduction
HTTP caching is defined in various specifications used by web servers and clients, particularly RFC2616. As such, the formal definition of how Enfold Proxy (EP) caching works should be considered that RFC. This document attempts to give an overview of how EP works in a less formal way.
At the bottom of this topic is an explanation of the core concepts behind caching and a diagram about how caching works.
Caching Goals
Enfold Proxy handles HTTP requests significantly faster than Zope can because it returns a cached version from EP's cache. Therefore, you should increase the ratio of requests which EP can handle from its own cache to the total number of HTTP requests.
There are three ways to do this (in descending order of importance).
- Increase the number of items which are cachable.
- Increase the length of time that individual items will be considered fresh.
- Increase the number of stale items which can be revalidated from Zope.
To do any of these three things, you need to modify the HTTP headers which Plone sends to Enfold Proxy. There are two ways to do this:
- Configure a cache profile in Plone. You do this after you install and enable the plone.app.caching add-on. (For earlier versions of Plone, another product CacheFu was also used).
- Manually set the HTTP headers on your Plone/Zope pages. (This must be done programmatically).
Important If you configure your cache settings incorrectly or if you make major modifications to your cache settings, it is best to purge the cache for your proxy definition. (See Purging Cached Content).
Increase the number of items which are cachable
One prerequisite to caching is knowing how much caching is too much and having a way to apply different caching policies to different kinds of Plone content.
The Plone.app.caching add-on provides sensible defaults for common content types. For more information about plone.app.caching go to http://plone.org/products/plone.app.caching/. After plone.app.caching is installed, you can tweak your settings in the Plone Site Setup --> Caching menu. After you enable caching and import a caching profile, you can also customize the settings even further by selecting the appropriate ruleset to use. To do this, go to Caching Operations and choose the Operation dropdown appropriate for the content type.
The best way to increase the number of cachable items is to make sure a Plone product is installed and verify that HTTP requests are generally being cached. (See Verifying that your cache settings are in effect).
Increase the length of time that individual items will be considered fresh.
Increasing the time that a content item is fresh will allow Enfold Proxy to handle the request by itself without having to bother Plone.
You do this in Plone by going to your caching product and increasing the max-age or s-max-age value for a content item or a category of content items. If you have enabled Headers or Debug log level, you can view the S-max-age or max-age value as well.
Increase the number of stale items which can be revalidated from Zope
The biggest bang for the buck comes when you have increased the length of time items will be considered fresh. You can also make minor time savings by maximizing the number of stale items which EP can keep for future revalidation. Successful revalidation occurs when EP receives a 304 message from Plone with an updated date stamp. Because Plone does not need to send the full content item in this case (only the 304 header with the date stamp), the overall load on Plone is reduced. Performing that transaction is faster than having to process and return a full Plone request; on the other hand, because this revalidation attempt involves 2 requests (to EP and Plone) instead of 1, the second request may cancel any potential time-saving. The advantage from revalidation comes not from this request but from increasing the amount of time in which EP's cache for this item will be fresh for other people seeking it.
Similarly, a revalidation attempt which is not successful will involve waiting for two requests and receiving the full content from Plone.
If a high percentage of revalidation attempts are not succeeding, that could slow down your site significantly. Therefore, when troubleshooting cache settings, you might as a first step verify that your revalidation attempts are actually resulting in success. The only way to do is to monitor traffic between Enfold Proxy and Plone and verify that Plone is sending 304s back to Enfold Proxy. You would need to inspect EP's logs after setting the log level to Headers or Debug. (For more info, see Headers log level).
Concepts
The first time a person makes a request for a content item, here is what typically happens:
At first, Enfold Proxy does not have a cachable copy of the item, so it must fetch it directly from Plone.
When an item is received from the backend server, it is examined to see if it is "cacheable" - that is, if it is able to be stored in the cache and used to satisfy future requests for that item.
Unless Plone includes a special header which specifies that the item is not cacheable, Enfold Proxy will keep a copy locally on its machine which it can use for responding to future browser requests.
If the sysadmin purges the cache for the proxy definition, then all cached copies are deleted (and Enfold Proxy has to start again from scratch).
After the initial request for a content item, here is what typically happens:
When a client connects to EP and asks for a request, EP first checks to see if it is in the cache - that is, if a previous request for the resource determined that it was cacheable, as described above.
If the item is in the cache, the next thing to be determined is if the item is "stale." A stale item means that although the item is in the cache, the parameters for that item indicate that it is no longer allowed to be used directly.
If an item is fresh, then it can be sent back to the client without contacting the backend server. However, if the item is stale, EP can generally "validate" such items. Validating an item consists of connecting to the server and asking the server if the version we fetched before is still the same. If the server responds in the affirmative, EP is still able to used the previously stale item, and although contact was made with the back-end server, the data itself was not re-transmitted.
It is important to understand the conceptual difference between an item being "cachable" and an item being "fresh". It is possible, and quite common, that a server will send a cachable response, but indicate it is immediately stale (i.e., by including a Max-Age:0 or an Expires value which is already out-of-date. This means that although EP can store the item in its cache, it is not able to use it in client requests without first validating it with the server. In this case, the end result is that the cache will never serve old items (because they are never fresh and require revalidation), but bandwidth usage between EP and the backend server will be reduced because Plone won't need to send the content item itself, only the 304 message with an updated date stamp.
HTTP Headers Reference
There are a number of HTTP headers used to control caching. Headers indicate if the response is cachable at all, and if so, how long the item should be considered "fresh". Other headers can also dictate certain paramaters for the cachability and freshness of the response - e.g., "this item can only be served from a cache if the client has these specific request headers". As a result, caching can get quite complex to understand. Unfortunately, that is just the nature of the beast.
This section attempts to detail the more common HTTP headers used to control caching. For more detailed information, please refer to RFC2616.
A number of headers found in CacheFu are not currently supported in Enfold Proxy. If any of these unsupported headers are present, Enfold Proxy will simply ignore them. Here is a list of headers which are not currently supported By Enfold Proxy: vary (limited support), no-transform, pre-support, post-check, stale-while-revalidate, stale-if-error.
With regard to "vary," EP only caches a single copy of an item, not one copy per variation. Thus, pages with different Vary headers may cause more cache misses than you might otherwise expect.
In general, Cache Control headers are supported. That includes max-age, s-maxage, public, no-cache, no-store, must-revalidate, and proxy-revalidate.
(Read an introduction to Cache Control headers at http://www.mnot.net/cache_docs/#CACHE-CONTROL).
Note: Browser-based tools for viewing HTTP headers don't show the full story. That is because you won't see the HTTP headers which Enfold Proxy and Plone will exchange. The easiest way to view the HTTP headers which EP is sending and receiving (from both the browser and Plone) is to examine the Enfold Proxy logs. Enfold Proxy has a log level called Headers Log Level to permit more user-friendly viewing of the HTTP traffic. But first you will need to enable this log level in your proxy definition.
- For informational purposes if your web browser is making an HTTP request, you may see another header specific to Enfold Proxy.
- If you see X-Cache: HIT, then yes, Enfold Proxy has sent a cached version to the web browser.
- If you see X-Cache: MISS, then no, Enfold Proxy did not send a cached version to the browser (i.e., it had to fetch it from Plone).
ETag: An etag is a string used to determine whether an item needs to be recalculated or served from cache. From EP's point of view, the format of the etag is not important; EP just compares the strings to see if they are alike or different. In plone.app.caching, the etag appears as a string of tokens separated by pipe characters
- Last-Modified:
- The last modified date for the requested object, in RFC 2822 format. This header is used to validate an item with the backend server after it is stale.
Expires: This is an older HTTP 1.0 header which indicates a date when the item should be considered stale. Normally this value should be in the future when served with the content. If the expires date has already arrived, then EP will
Pragma: no-cache: This is an old HTTP 1.0 header which indicates the item should not be considered cachable.
Cache-Control: This is a general-purpose header for controlling various aspects of caching. The value of the header determines what control is being requested. Common values are:
private no-store: indicate the item should not be considered cachable
- no-cache:
- despite the name, only indicates the item should be immediately considered stale - ie, it can be stored in the cache, but can not be served before validation. (Note a variation of this control allows you to specify a specific header that should not be cached, but that is not covered here)
- must-revalidate & proxy-revalidate:
- Indicates that once the item becomes stale, if there is an error validating the item, then the proxy must return a 503 (Bad Gateway) error rather than serving the cached response. Suppose the backend Plone server were down. In the absence of either header, EP will serve stale cached content. Plone would need to include either header to ensure that an error message is returned to the user.
max-age: number of seconds the item should be considered fresh for.
- s-maxage:
- Similar to the 'Expires' header, but indicates a number of seconds the item should be considered fresh for. s-maxage is the version used only by 'shared caches', or which EP is one. If expires and one of these headers is given, 'Expires' is ignored.