How Caching Works
How Caching Works
Why Cache?
One primary function of Enfold Proxy (EP) is to cache content for Enfold Server or Plone. Requests to Plone are expensive in terms of time, and caching minimizes the number of direct requests to Plone. In a typical configuration, Enfold Proxy and Internet Information Services (IIS) reside on the same machine, and Enfold Server (or Plone) exist on an entirely different machine. Enfold Proxy keeps cached versions of many items on its machine (typically C:\Program Files\Enfold Proxy\cache) and serves them directly to the person's browser without obtaining it from Plone/Enfold. This has two advantages. First, IIS and EP are faster to begin with, so that brings improved performance. Second, the more you cache, the less load that Plone receives (and lower RAM/CPU use).
This topic provides a general introduction to what caching is about and how to monitor it. The next topic Configure EP for Caching covers typical configurations, plus how to purge the cache. Another topic Cache Headers Concepts and Strategies explains in detail how EP decides what to cache. It also gives a listing of all the HTTP headers which EP recognizes when deciding whether to cache something.
Steve Souders wrote in the book High Performance Web Sites (O'Reilly, 2007), "Only 10-20% of the end user response time is spent downloading the HTML document. The other 80-90% is spent downloading all the components in the page." Enfold Proxy is the component to help assist downloading the remaining 80-90% more efficiently and in a way that ensures that stale content is not sent.
Introduction to Caching
First, a more basic question: what is web caching?
There are three different kinds of "cache" in the context of websites: A Private Cache (also called browser-based caching or user-agent cache). If a browser has already sought a web resource and needs the same resource again, it knows (through HTTP Headers) to find it through a local copy instead of having to request one remotely. In this case, the browser will receive a 304 message from the server (which is the signal for the browser to use its private cached). Ultimately, this kind of cache has nothing to do with Enfold or IIS. By default, the http headers for Enfold Proxy give Age=0, and that setting will always cause EP to check for a cached version. (Some plone-caching solutions use private caches, but Enfold Proxy does not).
Forward Proxy Servers keep local copies of frequently requested resources, allowing large organizations and ISPs to significantly reduce their upstream bandwidth usage and cost, while significantly increasing performance. For the most part, a forward proxy server operates independently of Plone/EP Proxy and is not really relevant to this document. However, some settings on the HTTP headers and CacheFu are geared to forward proxy servers, so this document mentions it here simply to point out the potential for confusion.
A caching proxy server (or reverse proxy ) receives web traffic to a site and returns cached content (usually static content) whenever possible. Enfold Proxy is a caching proxy server because it forwards HTTP requests received from IIS to Enfold Server/Plone only after first making sure that it does not have a cached version which it could return. See the Enfold Proxy Architecture for more details.
In the context of Enfold Proxy, we can break the concept even further according to caching method:
- Disk-based Cache. The Cached item is saved on the proxy server's file system as a file. This is probably the most common method.
- RAM-based Cache. Because directories in Plone refer not to physical files but objects, these items are not cached on the file system. Instead, they are cached as objects. Object-based cache is discussed in greater detail in the Caching XSLT topic.
Clearing Private Cache with your Browser
Web surfers are used to clicking F5 or even clicking the Refresh button to obtain the latest version of a web page. This doesn't clear the web cache and is not suitable for testing or troubleshooting caching issues. Here is the right way to clear the cache in each browser:
- Firefox. Tools --> Clear Private Data (Ctl + Shift + Delete) . You don't need to clear all the items here. Just Cache is sufficient
- Internet Explorer 7 or 8. Tools --> Internet Options --> (General Tab) --> Browsing History: Delete --> Delete Temporary Internet Files.
- Chrome 3. Choose the wrench icon at the top right --> Choose Clear Browsing Data from the dropdown option. Note: Pressing Ctl + Shift + Delete simultaneously will accomplish the same thing.
(It is not necessary to delete cookies or the other items). If you are trying to understand what percentage of items are being cached, you can view metrics for this directly in the cache logs for your proxy definition.
Viewing HTTP Headers
See also: Visualizing the EP caching process.
There are several ways to watch HTTP headers. First, you can use the logs that are generated by Enfold Proxy. The October 2008 release of Enfold Proxy now includes a special log level which shows HTTP headers and directional arrows in the log to indicate whether the request is coming or going. (Read more about how to view HTTP headers in the proxy log). The second way is to install plugins in your browser that lets you view the headers in real time. Below we will describe how to use browser plugins to view HTTP headers and how to interpret them. The HTTP headers in the proxy log may have slight differences than what you see below, but generally the syntax should be similar. Also, the directional arrows in the header logs should indicate where the responses are coming from. Mainly, you will be checking how often Zope/Plone is handling requests; the less, the better.
To verify caching, you will need to view your HTTP response headers. Several browser tools let you do this.
- Live HTTP Headers ( http://livehttpheaders.mozdev.org/ ) is the recommended tool. This is a Firefox plugin which lets you view http headers in real time. For easier reading, you can go to the Generator tab and filter out image requests.
- Firebug ( http://www.getfirebug.com/ ) Here is another Firefox plugin which not only records http headers for each item but also records other information (like download speed). You can view HTTP information for each HTTP request by clicking on the web resource. Live HTTP Headers is probably better at capturing data in real time. But Firebug shows graphically which resources on a web page are taking the longest time to load. It is commonly assumed that dynamically-generated server content accounts takes longer to load than images or scripts. In fact, Firebug reveals how often images and scripts are the culprits, not the html content.
- Fiddler ( http://www.fiddlertool.com/ ) This is a tool for Internet Explorer. A good MSDN tutorial about using fiddler is here http://msdn2.microsoft.com/en-us/library/bb250446(VS.85).aspx
Tip # 1: surf with two browsers! More than likely, you'll be using Firefox with Live Headers to inspect your http headers. Then you can use one browser specifically to simulate a logged-in user (where different caching rules apply) and the other browser to simulate the anonymous non-logged in user (where the most aggressive caching rules apply).
Tip #2: When you are logged in as administrative user, a smaller percentage of your HTTP requests will be cached. As a result, response time for a Plone page might seem slower than they actually would appear to an anonymous user.
Interpreting HTTP Headers
The next section covers how to read headers with the browser tools listed above. However, Enfold recommends examining EP logs directly instead of using browser tools whenever possible because they are less ambiguous and confusing. (Read more about how to view HTTP headers in the proxy log).
Ultimately the most accurate way to troubleshoot caching is to look at HTTP Headers. This can be a daunting task. Just one web page may involve 10-15 separate HTTP requests. However, if you have experience and know what to look for, you can spot problems quickly without becoming bogged down. For example, a simple URL such as http://www.originalfunsite.com/events will consist of these requests:
GET /events GET /portal_javascripts/Enfold%20Theme/ploneScripts6490.js GET /portal_css/Enfold%20Theme/ploneStyles6499.css GET /portal_css/Enfold%20Theme/ploneStyles1162.css GET /portal_css/Enfold%20Theme/ploneStyles6975.css GET /favicon.ico GET /info_icon.gif GET /user.gif GET /rss.gif GET /mail_icon.gif GET /print_icon.gif GET /topheader.png GET /input_background.gif GET /portal_css/Enfold%20Theme/ploneStyles2247.css GET /search_icon.gif GET /logo.gif GET /site_icon.gif GET /folder_icon.gif GET /topic_icon.gif GET /topic_icon.gif GET /linkTransparent.gif GET /arrowUp.gif GET /favicon.ico GET /arrowLeft.gif GET /arrowRight.gif GET /plone_powered.gif GET /enfold_powered.png GET /colophon_sec508.gif GET /colophon_wai-aa.gif GET /colophon_xhtml.png GET /colophon_anybrowser.png GET /colophon_css.png
Out of these requests only the first ( GET /events ) could conceivably be a Plone request. The rest of the http GETs (images, JavaScript and css) are automatically cached by Enfold Proxy (how long EP considers it to be fresh is another story). In fact, though, EP might even cache GET /events as well depending on the rules configured for it.
An HTTP header consists of a request and a response. Usually you will be interested only in the response.
http://www.originalfunsite.com/events GET /events HTTP/1.1 Host: www.originalfunsite.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive HTTP/1.x 200 OK Date: Sat, 16 Feb 2008 02:04:31 GMT Server: Microsoft-IIS/6.0, Zope/(Zope 2.9.6-final, python 2.4.4, win32) ZServer/1.1 Plone/2.5.2 X-Powered-By: EnfoldProxy 4.0.0.8015 (http://www.enfoldsystems.com/Products/Proxy) Content-Length: 25977 Content-Language: en X-Cache-Headers-Set-By: CachingPolicyManager: /Plone/caching_policy_manager Expires: Sat, 16 Feb 2008 03:04:31 GMT Cache-Control: max-age=3600, s-maxage=3600, public Content-Type: text/html;charset=utf-8 X-Cache: MISS from www.originalfunsite.com Via: 1.1 www.originalfunsite.com:80
Now let's try it again. This is what happens after you type in the same URL immediately again.
http://www.originalfunsite.com/events GET /events HTTP/1.1 Host: www.originalfunsite.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive HTTP/1.x 200 OK Date: Sat, 16 Feb 2008 02:15:02 GMT Server: Microsoft-IIS/6.0 X-Powered-By: EnfoldProxy 4.0.0.8015 (http://www.enfoldsystems.com/Products/Proxy) Content-Length: 25977 Content-Type: text/html;charset=utf-8 Cache-Control: max-age=3600, s-maxage=3600, public Expires: Sat, 16 Feb 2008 03:04:31 GMT Age: 0 X-Cache: HIT from www.originalfunsite.com
Now, let's interpret. In the first case, you had X-Cache: MISS from www.originalfunsite.com. This means that EP could find no appropriate cache from Enfold/Plone, so it needed to make a Plone request. Apparently, a Plone product called Caching Policy Manager set these headers. That is another sign that the response came directly from Enfold or Plone and not Enfold Proxy itself.
In the second HTTP block, we see X-Cache: HIT from www.originalfunsite.com which means that EP cached this item on the file system. The max-age=3600 refers to the 1 hour expiration time for Aggressive Caching you declared in the Caching Profile (http://www.originalfunsite.com/chasseur_profiles).
With CacheFu, the headers will look different (CacheFu throws in some extra headers), but essentially you are looking for the same X-Cache: HIT from www.originalfunsite.com line. The more hits, the better. (Generally, unless you plan to declare the cache headers on your Plone templates, you will need a Plone caching product to enable caching).
Measuring Cache Performance
One more thing. If you check Enfold Proxy's cache log messages for your proxy definition, you will see a rough percentage to use as a metric of how much caching is taking place:
2008-02-18 10:31:45,250|cache.host originalfunsite|STATS|3500|2856|Cache statistics: gets: 88, hits: 65 (33 validated), misses: 23 (0 uncachable) hitrate: 73% (58% excluding validations) size: 1943668 bytes, 324 items
For an explanation of what these numbers mean, see the topic on measuring cache performance and caching concepts and strategies.
More information: See a general introduction to web caching at http://www.mnot.net/cache_docs