Now that ColdFusion 9 has been out for a few months, I've compiled a list of issues around caching that I'd like to see addressed in ColdFusion 9.0.1 or Updater 1. These are all issues that haven't already been discussed elsewhere or filed by others as bugs/enhancement requests.

To get the ball rolling, I'm listing each of the issues here along with the bug/enhancement number for the corresponding issue I filed in Adobe's ColdFusion Bug Database.

I have some additional things I'd like to see such as cache cluster configuration withing the CF Admin and an upgrade to Ehcache EX so that we can do true distributed caching within ColdFusion, but I'll address those items in a later post. For now, here's my list. I'm interested to hear how others are doing with the new caching in ColdFusion 9.

Ambiguous Error Message when Distributed Caching via RMI is Enabled and Server is not Connected to a Network (Bug 81840)

If you've configured your ehcache.xml file for clustering via RMI and you are not connected to a network, ColdFusion will throw an error if you try to perform any caching activities. This bit me at MAX when my network connection dropped before I started showing my examples. When I tried running any of my examples, I got the following error:


The web site you are accessing has experienced an unexpected error.
Please contact the website administrator.
The following information is meant for the website developer for debugging purposes.

Error Occurred While Processing Request
error setting options
    
The error occurred in C:\_web\default\wwwroot\MAX2009\cache_get_put_function.cfm: line 1
1 : <cfset getArtists = cacheGet("artistQuery")>
2 :
3 :
________________________________________
Resources:
"    Check the ColdFusion documentation to verify that you are using the correct syntax.
"
    Search the Knowledge Base to find a solution to your problem.

Browser     Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3
Remote Address     127.0.0.1
Referrer     http://localhost/max2009/
Date/Time     23-Oct-09 11:26 AM

Stack Trace
at cfcache_get_put_function2ecfm561514261.runPage(C:\_web\default\wwwroot\MAX2009\cache_get_put_function.cfm:1)
net.sf.ehcache.CacheException: error setting options
    at net.sf.ehcache.distribution.MulticastRMICacheManagerPeerProvider.init(MulticastRMICacheManagerPeerProvider.java:93)
    at net.sf.ehcache.CacheManager.init(CacheManager.java:241)
    at net.sf.ehcache.CacheManager.<init>(CacheManager.java:221)
    at net.sf.ehcache.CacheManager.create(CacheManager.java:415)
    at net.sf.ehcache.CacheManager.getInstance(CacheManager.java:436)
    at coldfusion.tagext.io.cache.ehcache.GenericEhcache.createCache(GenericEhcache.java:294)
    at coldfusion.tagext.io.cache.ehcache.GenericEhcache._getCache(GenericEhcache.java:288)
    at coldfusion.tagext.io.cache.ehcache.GenericEhcache.getCache(GenericEhcache.java:255)
    at coldfusion.tagext.io.cache.ehcache.GenericEhcache.get(GenericEhcache.java:72)
    at coldfusion.tagext.io.cache.CacheTagHelper.getFromCache(CacheTagHelper.java:226)
    at coldfusion.runtime.CFPage.CacheGet(CFPage.java:8025)
    at cfcache_get_put_function2ecfm561514261.runPage(C:\_web\default\wwwroot\MAX2009\cache_get_put_function.cfm:1)
    at coldfusion.runtime.CfJspPage.invoke(CfJspPage.java:231)
    at coldfusion.tagext.lang.IncludeTag.doStartTag(IncludeTag.java:416)
    at coldfusion.filter.CfincludeFilter.invoke(CfincludeFilter.java:65)
    at coldfusion.filter.ApplicationFilter.invoke(ApplicationFilter.java:342)
    at coldfusion.filter.RequestMonitorFilter.invoke(RequestMonitorFilter.java:48)
    at coldfusion.filter.MonitoringFilter.invoke(MonitoringFilter.java:40)
    at coldfusion.filter.PathFilter.invoke(PathFilter.java:87)
    at coldfusion.filter.LicenseFilter.invoke(LicenseFilter.java:27)
    at coldfusion.filter.ExceptionFilter.invoke(ExceptionFilter.java:70)
    at coldfusion.filter.BrowserDebugFilter.invoke(BrowserDebugFilter.java:74)
    at coldfusion.filter.ClientScopePersistenceFilter.invoke(ClientScopePersistenceFilter.java:28)
    at coldfusion.filter.BrowserFilter.invoke(BrowserFilter.java:38)
    at coldfusion.filter.NoCacheFilter.invoke(NoCacheFilter.java:46)
    at coldfusion.filter.GlobalsFilter.invoke(GlobalsFilter.java:38)
    at coldfusion.filter.DatasourceFilter.invoke(DatasourceFilter.java:22)
    at coldfusion.filter.CachingFilter.invoke(CachingFilter.java:53)
    at coldfusion.CfmServlet.service(CfmServlet.java:200)
    at coldfusion.bootstrap.BootstrapServlet.service(BootstrapServlet.java:89)
    at jrun.servlet.FilterChain.doFilter(FilterChain.java:86)
    at coldfusion.monitor.event.MonitoringServletFilter.doFilter(MonitoringServletFilter.java:42)
    at coldfusion.bootstrap.BootstrapFilter.doFilter(BootstrapFilter.java:46)
    at jrun.servlet.FilterChain.doFilter(FilterChain.java:94)
    at jrun.servlet.FilterChain.service(FilterChain.java:101)
    at jrun.servlet.ServletInvoker.invoke(ServletInvoker.java:106)
    at jrun.servlet.JRunInvokerChain.invokeNext(JRunInvokerChain.java:42)
    at jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:286)
    at jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:543)
    at jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:203)
    at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:428)
    at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)

This error only happens if you have RMI enabled in ehcache.xml and you aren't connected to a network. I don't expect this would ever be a problem in a production environment, but it could pop up on development machines – especially if people develop disconnected. At a minimum, I'd like to potentially see a more meaningful error message (even though you can narrow it down to RMI via the stack trace).

DiskStore Configuration is only Configurable in ehcache.xml (Bug 81841)

In ehcache.xml, there's a configurable property called that's set by default to point to your Java temp directory:


<diskStore path="java.io.tmpdir"/>

You can change this to any path you want in ehcache.xml and it applies globally.

The ColdFusion documentation for the cacheSetProperties() tag currently shows that diskstore is a valid option:

http://help.adobe.com/en_US/ColdFusion/9.0/CFMLRef/WSc3ff6d0ea77859461172e0811cbec22c24-7c18.html

My testing, however, has shown that is not working in the shipping build of ColdFusion 9. Try the following example (you'll need to create a directory off your root, c:/temp):


<cfset myProps = structNew()>
<cfset myProps.diskstore = "c:/temp"> <!--- in the docs, but not currently implemented --->
<cfset myProps.diskpersistent = "true">
<cfset myProps.eternal = "false">
<cfset myProps.maxelementsinmemory = "5000">
<cfset myProps.maxelementsondisk = "100000">
<cfset myProps.memoryevictionpolicy = "LRU">
<cfset myProps.objecttype = "Object">
<cfset myProps.overflowtodisk = "true">
<cfset myProps.timetoidoleconds = "86400">
<cfset myProps.timetolivesecond = "86400">

Before:


<cfdump var="#cacheGetProperties("Object")#">

<!--- update the cache properties --->
<cfset cacheSetProperties(myProps)>

After:


<cfdump var="#cacheGetProperties("Object")#">

<cfset cachePut("item1", "this is a test")>

If you run this code and check the c:/temp directory you should find it empty. Check your jave temp directory (c:/windows/temp on wintel) and you'll see that the files are still written there showing that the default value in ehcache.xml is being used and not the value being set in the cacheSetProperties() function.

The obvious action is to remove this attribute from the documentation until it can be implemented in a later version of ColdFusion. I would like to see it implemented, though, as I think it's useful to be able to set the diskstore location programmatically.

CacheGetMetadata() Returns both Cache Wide and Item Specific Metadata (ER 81842)

Currently, the cacheGetMetadata() function returns metadata that applies both to the individual cache item passed to the function as well as metadata for the entire cache region that the cached item came from. For example, if you have an item in the cache with an ID of "item1" and you can get the metadata for it like this:


<cfdump var="#cacheGetMetadata("item1")#">

What this actually returns to you is a structure with two sets of information:

Cache_hitcount and cache_misscount apply to the overall cache. In other words, how many hits and misses the entire cache have received. The rest of the keys returned in the structure apply to the item passed in to the cacheGetMetadata() function.

I see a couple of potential issues with how this has been implemented. What I think we really need is two separate functions here. The existing cacheGetMetadata() should return just the metadata that is specific to the item passed to the function, not metadata on the entire cache. I would expect a separate function to retrieve metadata for the cache itself. To avoid confusion, I'd call the new function something like cacheGetStats(). In my mind it would return the cache_hitcount and cache_misscount that are currently returned by the cacheGetMetadata() function. It would also return a whole lot more that's available from ehcache but not exposed to ColdFusion – things like the total number of items currently in the cache, total size in bytes of all of the items in the cache, etc.

An Alternate Proposal for Additional ehcache Functionality (ER 81843)

What I think might be an even better and more future-proof solution for obtaining additional metadata (and more) from ehcache would be to provide similar functions to the ORMgetSessionFactory() and ORMgetSession() functions. If we had something like cacheGetSessionFactory() and/or cacheGetSession(), we could get access to all of the additional functionality in ehcache that's not currently exposed. Specifically this would give us an easy to get access to more cache statistics than we can access now, without having to write a bunch of new ColdFusion functions to handle things that might not be considered core or essential to caching.

Inability to Specify Cache Key in Functions (ER 81844)

The cfcache tag has the key attribute which allows you to specify a custom cache region other than the default template or object cache:


<cfcache key="customCache" action="put" id="1" value="#now()#">

This creates a cache region at runtime called customCache. Getting, putting and flushing the cache are all supported from the cfcache tag. However, it's not currently possible to use any of the caching functions to operate on the custom cache region because there is no way to specify which cache region you want to use – the functions always apply to the default object or template cache.

I'd like to propose adding an optional attribute to all cache functions to allow a developer to pass in a custom cache region (key) for the function to operate against where applicable.

Naming Inconsistencies for Removing Cache Items (81845)

There are some naming inconsistencies between the cfcache tag and the new caching functions. To remove an item from the object cache using the cfcache tag, you use action="flush" like this:


<cfcache action="flush" id="itemID">

Using a function instead, you use cacheRemove():


<cfset cacheRemove("itemID")>

Fragment Cache Gotcha (bug 81846)

I was just playing around some more with the template cache in ColdFusion 9 and noticed a behavior I hadn't expected that I think could potentially cause problems for people down the road.

Consider the following example:


<cfcache>
This is a fragment
</cfcache>

<cfcache>
<cfoutput>
And so is this #now()#
</cfoutput>
</cfcache>

<cfdump var="#getalltemplatecacheids()#">

If you execute this, you'll get some output as well as a dump of all of the cached items in the template cache (getalltemplatecacheids is an undocumented function). At this point, there are two items in the cache:

Now say you weren't happy that the two lines of output on your page were all run together and you wanted to put a line break between each one. You might modify your code and add in a simple paragraph tag or line break like so:


<cfcache>
This is a fragment
</cfcache>

<p>

<cfcache>
<cfoutput>
And so is this #now()#
</cfoutput>
</cfcache>

<cfdump var="#getalltemplatecacheids()#">

Go ahead and run the code once you've inserted the

tag. What you'll now see is that you have three items in the cache:

What's happening is that ColdFusion is using the position of the code in your page as part of the ID for the cached item. When you inserted the

tag, the position of the fragment you cached also changed by moving down a few lines, and ColdFusion assumed you had added a new page fragment that you wanted to cache.

This may or may not be a big deal to people as CF will still pull the correct cached item every time. The gotcha is that if you have a lot of caching going on, and you're making a lot of changes in a development environment, it's possible to fill your cache up with a lot of junk pretty quickly.

I'm not making any statements on how this feature was implemented - I just want to make people aware of how the template cache works under the covers since it's not documented and if people are paying attention when they are developing they might see some things with the template cache that don't make sense if they don't know how it works.

cacheSetProperties() and cacheGetProperties() are Missing Configurable Parameters (ER 81847)

ehcache lets you configure a number of parameters for a cache region via the ehcache.xml file. ColdFusion exposes most of these configurable parameters at runtime using cacheSetProperties() and cacheGetProperties(). There are, however, 3 parameters which currently aren't exposed and can only be set within ehcache.xml:

diskSpoolBufferSizeMB: This is the size to allocate the DiskStore for a spool buffer. Writes are made to this area and then asynchronously written to disk. The default size is 30MB. Each spool buffer is used only by its cache. If you get OutOfMemory errors consider lowering this value. To improve DiskStore performance consider increasing it. Trace level logging in the DiskStore will show if put back ups are occurring.

clearOnFlush: It determines whether the MemoryStore should be cleared when flush() is called on the cache. By default, the MemoryStore is cleared. Useful is you want to back up a cache to the file system without clearing the MemoryStore.

diskExpiryThreadIntervalSeconds: The number of seconds between runs of the disk expiry thread. The default value is 120 seconds.

I'm not sure why these were left out but it would be nice if they were also included in the configurable properties using ColdFusion functions.

Welcome to Part 9 of my series on Caching Enhancements in ColdFusion 9. Today we're going to cover something called dependent template caching. Strange name, I know. If you remember back from Part 7, we said that by default, when you cache a web page or page fragment with the cfcache tag, that page/fragment goes into the cache and is retrieved for any subsequent visits to that page. The same content is returned for everyone. We also covered a method for caching pages based on unique URL parameters such that different versions of the same page (say a product display page) would be cached and retrieved from the cache based on URL parameters. But what about page/page fragments that vary based on other variables that aren't passed in via URL? This is where dependent template caching comes in.

Dependent template caching allows you to specify a variable or list of variables to "watch" for changes. If the value of one of these variables changes from the first page or fragment that was cached, ColdFusion will create a new variant for the page/fragment and store that in the cache as well. This is all handled by using the new dependsOn attribute of the cfcache tag in ColdFusion 9. If you are reading this and wondering where this might be useful, you aren't alone. When I first read about this feature in the ColdFusion docs, I misunderstood the intent of the attribute and how it's supposed to work. Here's what the Coldfusion 9 docs have to say about dependsOn:

A comma separated list of variables. If any of the variable values change, ColdFusion updates the cache. This attribute can take an expression that returns a list of variables.

I think the key to the misunderstanding people have about this feature is in the part that says "If any of the variable values change, ColdFusion updates the cache." To me, updating the cache means replacing an old/expired/changed value with a new one. It's a one for one swap of items in the cache. Out with the old and in with the new. But this isn't what happens when you use dependsOn. What the docs should say is that when a variable value changes, ColdFusion creates a new entry in the cache for the changed item so that both the original page/fragment as well as the new page/fragment are now in the cache. Here's a quick example to illustrate how this works:


<cfset y=true>

<cfflush>
<cfloop index="x" from="1" to="5">
<cfif x is 3>
    <cfset y=false>
<cfelse x is 5>
    <cfset y=true>
</cfif>


<cfset sleep(1000)>
<cfflush interval="10">
<cfcache action="serverCache" dependsOn="#y#" stripWhiteSpace="true">
<cfoutput>
I'm cached dynamic data: #now()# <br/>
</cfoutput>
</cfcache>

<!--- dump what's in the template cache --->
<cfdump var="#getAllTemplateCacheIDs()#">
</cfloop>

If you run this code, you should see something that looks like this:

What this code does is create a page fragment and cache it within a loop. The cfcache tag is set to watch a variable called y for changes. The value of y is initially set to true. There's also some conditional code in the loop which waits for the third and fifth iterations of the loop to fire. We'll get to that in just a moment. For now, let's step through each iteration of the loop and discuss what's happening. During the first iteration of the loop, the fragment is added to the cache. During the 2nd iteration of the loop, the fragment is pulled from the cache and displayed. During the third iteration of the loop, the value of x is 3 and our cfif statement fires, updating the value of y to false. Because y is the value we set to watch in the dependsOn attribute of our cfcache tag and it has now changed from true to false, this signals ColdFusion to go ahead and cache the version of the loop output we're now on for iteration 3 of the loop. This is where we end up with a 2nd fragment in the cache, not an update to the existing fragment in the cache. The fourth iteration of the loop also displays the second cached fragment since the value of y is still false. For the fifth and final iteration of the loop, our conditional code within the loop fires again. This time it sets y back to false, a value which we already have a fragment stored in the cache. ColdFusion knows to go grab the fragment for false from the cache and displays it.

There's one other thing to note in here. I didn't think to include this in any of the previous posts on the template cache so I've decided to add it here. If you look at the end of the cfcache tag in our example, you'll notice a parameter you probably haven't seen before: stripWhiteSpace. This is an optional parameter that only works if you are using the template cache to cache page fragments. Setting it to true (it's false by default) tells Coldfusion to strip any unnecessary whitespace from the fragment before storing it in the cache.

While this is a good example of the mechanics of dependent caching, it's not really a practical example. For that, let's consider a real world example where you would want to make use of dependent caching. Say you have an application that requires authentication. The main landing page for the application is personalized based on who is logged in. In this case, you can't cache a single version of the main page as you wouldn't want it to say "Hello Tom" when Mary logs in. Sure you could solve this by passing the username along in the URL, but you probably don't want to do that – who wants to deal with all of the extra validation code to make sure someone doesn't go and change that URL variable to someone else's username. No, in this case, you would probably be using session variables in your application to maintain persistence, and session variables are a perfect use case for dependent caching. Here, we could set dependsOn to watch something that uniquely identifies a user and when that changes (a different user is logged in), the personalized version of the page for them could be added to the cache. Let's take a look at some simple code that implements this idea. The first thing we'll need is an Application.cfc file to setup session management and handle security basics for us:


<cfcomponent output="false">
    <cfset this.name = "dependentCaching" />
    <cfset this.sessionManagement = true>

<cffunction name="onRequestStart" eeturntype="boolean" output="false">
     <cfif StructKeyExists( URL, "logout" )>
        <cfset this.onSessionStart() />
    </cfif>

    <cfreturn true />
</cffunction>

    <cffunction name="onRequest" returnType="void" output="true">
    <cfargument name="Page" type="string" required="true">

        <cfif session.loggedIn>
            <cfinclude template="#arguments.Page#">
        <cfelse>
            <cfinclude template="login.cfm">
        </cfif>

        <cfreturn />
    </cffunction>

    <cffunction name="onSessionStart" returnType="void" output="false">
    <cfset session.loggedIn = false>
    </cffunction>

</cfcomponent>

This code gives our application a name and turns on session management. If also has an onRequestStart() method that looks for a URL variable called logout, and if it finds one it fires off the onSessionStart() method, effectively logging the user out be changing the value of session.loggedIn to false.

The onRequest() method handles the check to see if a user is authenticated for a requested page. If session.loggedIn is true, the page they were requesting is included. Otherwise, we assume that the user is not logged in and include the login form instead.

The onSessionStart() method fires at the beginning of a user's session and sets their logged in status to false

Remember that this is just a simple example and for that reason does not contain all of the code you would use to implement something like this in real life (validation checks, error handling, etc.).

The next file we need is our login.cfm page:


<cfif isDefined('form.submit')>
<cfset session.loggedIn = true>
<cfset session.userName = form.username>

<!--- send the user back to the main page --->
<cflocation url="index.cfm" addtoken="false" />

<cfelse>

<cfoutput>
<form method="post">
Name: <input type="text" name="userName"><br />
Password: <input type="password" name="password"><br />
<input type="submit" name="Submit" value="Submit">
</form>
</cfoutput>
</cfif>

This page is just a simple login form. It firsts checks to see if it was called via a form submit, and if so sets the value of session.loggedIn to true. It also sets another session variable to hold the user's username. After the variables are set, the user is redirected to the main landing page for the application (index.cfm) Again, if this were a real application we would have an actual login check here but for the purposes of this example we're just assuming that any username/password combo is valid.

If the user arrived at the page directly from the onRequest() method of our Application.cfm page, we know that they have not yet logged in so we display a login form for them. When they submit this, the page submits to itself and the code we previously discussed fires, logging the user in and redirecting them to the main application page. Here's the code for the main index.cfm page:


<cfcache action="serverCache" timespan="#createTimeSpan(0,0,5,0)#" dependsOn="session.username">
<cfoutput>
Welcome #session.username#

<p>This is your personalized page.</p>

<p>Timestamp: #timeFormat(now(), 'hh:mm:ss')#</p>

<p><a href="index.cfm?logout=true">Logout</a></p>
</cfoutput>

There's not a whole lot going on here. All we do is set a cfcache tag at the top of the page telling ColdFusion that we want to cache the contents of the entire page. A timespan of 5 minutes is set just to keep the example from staying in the cache forever. Notice we also set dependsOn=session.username". This is where the magic happens. What we've done is told ColdFusion is that every time a different user tries to call this page, it should first check the cache to see if there's already a page stored for this user and if so, grab and use that version. If not, it should generate a new version of the page and cache that value for later use.

If you want to see this in action, go ahead and open the index.cfm page in your browser. You should be redirected to the login form. You can enter anything you like for the username and password. Once you submit the login form, you should be redirected to a personalized version of the index.cfm page. Note the value of the timestamp.

Now go ahead and click on the logout link. This will clear your session and cause the login form to display again. Try logging in using a different username this time. After submitting, you'll again be redirected to a personalized version of index.cfm.

Logout again and repeat the process again but this time use the username you entered the first time. When you submit and are taken to the index.cfm page you should notice that the value of the timestamp is the same as the first time you logged in as this user. This is because ColdFusion saw that session.userName changed and found a page in the cache that corresponded to the username you logged in with (the username becomes part of the key for the page in the template cache). If you want to see that there are two distinct pages in the cache, just create a new ColdFusion page in the same directory as the rest of your application and add dump the template cache using this code:


<cfdump var="#getAllTemplateCacheIds()#">

You'll end up with something like this:

As you can see, each individual user has their own copy of the index.cfm page in the cache thanks to dependsOn.

I hope these examples were straightforward and useful enough to demonstrate the usage and power of dependent caching in ColdFusion 9. This is the last post on the template cache I have planned for the series. In Part 10, we'll start to take a look at the object cache in ColdFusion 9 before moving on to more advanced topics.

When using the template cache in ColdFusion 9, you have two main options for getting pages and page fragments out of the cache – time based expiry and flushing. This blog post covers both. You should not that all of the examples here cache full pages. In most cases, the techniques discussed can be applied equally to page fragments.

Expiring Items in the Template Cache

It's also possible to set expiry periods for items in the template cache. Here, you have two optional parameters built in to the cfcache tag to help you expire pages and fragments from the cache based on time periods. The two parameters you can use are idletime and timespan. Idletime lets you specify a period of time after which to flush the cache if the cached item has not been accessed. In other words, if the cached item hasn't been accessed in the time period specified by idletime, the item will be removed from the cache. Here's an example that caches a page and will flush it after 30 seconds of inactivity:


<cfcache action="serverCache" idletime="#createTimeSpan(0,0,0,30)#">

<cfoutput>
I'm dynamic. The time is currently #timeFormat(now(),'hh:mm:ss')# </cfoutput>

If you run this code and then run it again, you'll see that the timestamp for the page is cached. Now wait for 30 seconds or so and reload again. You should see that the time has updated.

The timespan parameter lets you specify a period of time after which the cached item should be flushed regardless of whether it's ever been accessed or not. This basically lets you say "keep this page in the cache for 30 seconds". Here's an example:


<cfcache action="serverCache" timespan="#createTimeSpan(0,0,0,30)#">

<cfoutput>
Currently #timeFormat(now(),'hh:mm:ss')#
</cfoutput>

This code sets a timespan of 30 seconds. If you run the code then run it again, you'll see that the timestamp gets cached. Go ahead and keep hitting the reload button on your browser every few seconds. After 30 seconds have gone by, the timestamp will update as the old content is flushed from the cache.

Flushing Items from the Template Cache

In addition to time based expiry, you can also manually flush both pages and page fragments from the template cache using the cfcache tag. Here's an example:


<cfcache action="flush">

There are a couple of things to note here. First, when you call this code, it flushes all templates in the template cache for your application, not just the page you call it from. Let me repeat that. Using the cfcache tag with action="flush" causes the template cache to flush all of the content for the current application. You need to be very careful using this as it's possible that on a high traffic site with a large template cache you could bring your server down when hundreds or thousands of requests hit at the same time for uncached data, causing your system to queue up requests while it's busy rebuilding the cache. In most cases what you'll want to do is to flush a single page or perhaps a group of related pages from the template cache.

So, how do you go about flushing a single page or a group of related pages from the template cache? There's another parameter of the cfcache tag you can use called expireURL just for this purpose. Let's consider an example of how we would use this:


<cfif isDefined('URL.productID') AND isDefined('URL.flush')>
    <cfcache action="flush" expireURL="*.cfm?productID=#URL.productID#">
</cfif>

<cfparam name="URL.productID" default="0">

<cfcache action="serverCache" usequerystring="true" timespan="#createTimeSpan(0,0,5,0)#">
<cfoutput>
Welcome to the page for product ID #URL.productID#<br />
Timestamp: #timeFormat(now(), 'hh:mm:ss')#
</cfoutput>

Go ahead and run this example in your browser. You should have something on your screen that looks like this:


Welcome to the page for product ID 0
Timestamp: 03:40:11

Reloading the page should return the same timestamp over and over as the template is cached after the first call. Now try adding the URL parameter ?productID=72 to the URL string in your browser and reload the page a few times to get the new version of the page into the cache. Try changing the value of productID a few more times, each time reloading the page so that we get a handful of pages in the template cache. Now try dumping the contents of the template cache by running this code in a separate ColdFusion file:


<cfdump var="#getAllTemplateCacheIds()#">

You should see a bunch of different entries – one for each unique productID you provided in the URL. Now go ahead and call the code we've been working with for this example, only this time append ?productID=72&flush to the URL. Go ahead and reload the page with those URL parameters a couple times. Notice the timestamp updating each time you reload? That's because passing in the URL parameter flush calls the ColdFusion page to call the cfcache tag with expireURL="*.cfm?productID=#URL.productID#". The expireURL parameter lets you specify a wild carded URL pattern such that only pages that match the pattern get flushed. This allows you to get as generic or as specific as you want in determining exactly what pages to flush. In this case we're telling ColdFusion to go ahead and flush any cached .cfm pages that have the URL parameter ?productID equal to the value that we pass in on the URL.

If you dump the contents of the template cache you'll notice that there's still an entry for the item(s) we just flushed. That's because the code in the example we've been running repopulates the cache immediately after the flush. If you didn't want the cache repopulated immediately after flushing the content, you could just as easily locate the code to flush the cache elsewhere and just remove the item.

If you want to remove a group of related pages from the cache, say all of the pages that have a productID, you could modify the code to wildcard the productID parameter like so:


<cfcache action="flush" expireURL="*.cfm?productID=*">

<cfdump var="#getAllTemplateCacheIds()#">

This should result in the removal of all pages from the cache that have a productID URL parameter. Note that if you run this right now in ColdFusion 9.0 it will not work. There is a bug in ColdFusion 9.0 with the wildcard feature. If you put the flush code in a separate template and run it multiple times, what you'll see is that one page at a time is flushed from the cache instead of all of the pages that match the URL pattern at once. Credit goes to Aaron West for catching this bug.

That about wraps up everything I have to say about expiring and flushing pages and page fragments from the template cache. In Part 9 we'll talk about one last feature of the template cache – Dependent Caching.

In Part 5 of this series, we mentioned that Ehcache could be configured at runtime via ColdFusion code as well as by using an XML configuration file called ehcache.xml.

On a Java EE install of ColdFusion 9, you can find the ehcache.xml file located here:

/JRun4/servers/servername/cfusion-ear/cfusion-war/WEB-INF/cfusion/lib

There are two main sections of the file you need to be concerned with for basic configuration. The first section you should take a look at is the DiskStore configuration:


<diskStore path="java.io.tmpdir"/>

This tag tells Ehcache where it should write cache files if you have the cache configured to overflow to disk or to persist to disk. We'll get into the specifics of those options, but for now it's important to know that by default Ehcache will use your Java temp directory to store cache files if it is configured to do so. On Windows, the Java temp directory is located in c:/windows/temp. You can change the value here to any drive/directory on your system if you wish to use a location other than the Java temp directory.

The next section to take a look at comes at the end of the ehcache.xml file. Skip on down to the very end where you should see a block of XML that looks like this:


<defaultCache
maxElementsInMemory="10000"
eternal="false"
timeToIdleSeconds="86400"
timeToLiveSeconds="86400"
overflowToDisk="false"
diskSpoolBufferSizeMB="30"
maxElementsOnDisk="10000000"
diskPersistent="false"
diskExpiryThreadIntervalSeconds="3600"
memoryStoreEvictionPolicy="LRU"
/>

This block of XML tells ColdFusion how to configure all of the Object and Template caches that are automatically created for your application. When a cache is automatically created, the name for the cache is also automatically created using the convention appnameOBJECT for object caches and appnameTEMPLATE for template caches. Each cache has a number configurable parameters:

  • maxElementsInMemory: Sets the max number of objects that will be created in memory. Once this limit is reached, the cache will either overflow to disk (if overflowToDisk is set to true), or the appropriate eviction policy will be executed against the cache to make enough room for the new item(s) being added.
  • eternal: Sets whether elements are eternal. If eternal is set to true, timeouts are ignored and the element is never expired.
  • timeToIdleSeconds: Sets the time to idle for an element before it expires.
  • timeToLiveSeconds: Sets the time to live for an element before it expires.
  • overflowToDisk: Sets whether elements can overflow to disk when the memory store has reached the maxElementsInMemory limit.
  • diskSpoolBufferSizeMB: Specifies the spool buffer size for the DiskStore, if enabled. Writes are made to the spool buffer before they are asynchronously written to the DiskStore.
  • maxElementsOnDisk: Sets the max number of objects that will be maintained in the DiskStore
  • diskPersistent: Whether the disk store persists between restarts of the Virtual Machine.
  • diskExpiryThreadIntervalSeconds: Specifies the interval (in seconds) between runs of the disk expiry thread.
  • memoryEvictionPolicy: Policy to enforce upon reaching the maxElementsInMemory limit (LRU, LFU, FIFO).

If you make changes to any of these parameters, ColdFusion will apply them to any new caches that it automatically creates. If you have disk persistence or overflow to disk turned on, two files will be written to your file system per cache, an index file and a data file. For an object cache you would get appnameOBJECT.index and appnameOBJECT.data.

If you want to see what properties have been set for your application's cache, you can do so using the cacheGetProperties() function. The function takes a single optional parameter that specifies the type of cache to return the properties for. Options are Template or Object. If you don't specify the cache type to return properties for, ColdFusion returns them for both cache types. Here's an example that dumps the properties for both the default Object and Template caches:


<cfdump var="#cacheGetProperties()#">

This will result in output that looks like this:

As you can see from the screen shot, the structure keys correlate to parameters form the ehcache.xml file with two notable exceptions. Both diskSpoolBufferSizeMB and diskExpiryThreadIntervalSeconds are not reported on as properties that can be changed programmatically at runtime.

If you wish to change any of these properties programmatically, you can do so using the cacheSetProperties() function. This function takes a single argument – a structure containing all of the properties that should be configured. You can configure any of the following parameters:

  • objectType: Specifies the cache type: Object, Template, or All
  • diskStore: Supposed to specify the location of the DiskStore for disk based caching but this is currently not working as of ColdFusion 9.0.
  • diskPersistent: Whether the disk store persists between restarts of the Virtual Machine. True|False
  • eternal: Sets whether elements are eternal. If eternal, timeouts are ignored and the element is never expired. True|False
  • maxElementsInMemory: Sets the max number of objects that will be created in memory. Integer
  • maxElementsOnDisk: Sets the max number of objects that will be maintained in the DiskStore. Integer
  • memoryEvictionPolicy: Policy to enforce upon reaching the maxElementsInMemory limit: LRU, LFU, FIFO
  • overflowToDisk: Sets whether elements can overflow to disk when the memory store has reached the maxElementsInMemory limit: True|False
  • timeToIdleSeconds: Sets the time to idle for an element before it expires: Integer number of seconds
  • timeToLiveSeconds: Sets the time to live for an element before it expires: Integer number of seconds

Remember, this only applies to caches automatically created by ColdFusion. If a cache doesn't yet exist and you call cacheSetProperties(), ColdFusion will automatically create the cache for you based on whether you are setting properties for an Object cache, a Template cache, or both.

The following code shows how to build the structure of parameters necessary to configure a cache and set the cache properties using the cacheSetProperties() function:


<cfset myProps = structNew()>
<!---
<cfset myProps.diskstore = "c:/temp"> <!--- in the docs, but not currently implemented --->
--->

<cfset myProps.diskpersistent = "true">
<cfset myProps.eternal = "false">
<cfset myProps.maxelementsinmemory = "5000">
<cfset myProps.maxelementsondisk = "100000">
<cfset myProps.memoryevictionpolicy = "LRU">
<cfset myProps.objecttype = "Object">
<cfset myProps.overflowtodisk = "true">
<cfset myProps.timetoidoleconds = "86400">
<cfset myProps.timetolivesecond = "86400">

Before:
<cfdump var="#cacheGetProperties("Object")#">

<!--- update the cache properties --->
<cfset cacheSetProperties(myProps)>

After:
<cfdump var="#cacheGetProperties("Object")#">

It's also possible to create more caches than just the default template and object caches that ColdFusion creates automatically for you. This can be achieved by defining them in your ehcache.xml file or at runtime using the cfcache tag. To configure a new cache region in your ehcache.xml file, you would do so like this (place this before or after the defaultCache block in your ehcache.xml file):


<cache
name="myCustomObjectCache"
maxElementsInMemory="500"
eternal="false"
timeToIdleSeconds="86400"
timeToLiveSeconds="86400"
overflowToDisk="true"
diskSpoolBufferSizeMB="30"
maxElementsOnDisk="10000000"
diskPersistent="true"
diskExpiryThreadIntervalSeconds="3600"
memoryStoreEvictionPolicy="LRU">

As you can see, the only difference between this code and the code for the default cache is that you give the cache region a name using the name parameter.

You should know that neither cacheGetProperties() nor cacheSetProperties() can be used to configure the properties for a custom cache in ColdFusion 9.0. Hopefully this is a feature that will be added in a future version of ColdFusion.

If you want to read from or write to a custom cache, you can only do so using the cfcache tag. Here's an example:


<!--- attempt to get the artist query from
the custom object cache --->

<cfcache
    key="myCustomObjectCache"
    action="get"
    id="artistQuery"
    name="getArtists"
    metadata="myMeta">


<!--- if the item isn't there, it'll return null.
In that case, run the query and cache the
     results and rerun the data from the db instead --->

<cfif isNull(getArtists)>
    <cfquery name="getArtists" datasource="cfartgallery">
        SELECT *
        from artists
    </cfquery>

    <cfcache
        key="myCustomObjectCache"
        action="put"
        id="artistQuery"
        value="#getArtists#">

</cfif>

<!--- dump the query from cache --->
<cfdump var="#getArtists#">

<!-- dump the cache meta data --->
<cfdump var="#myMeta#">

This code is almost identical to the code we wrote in Part 5 where we introduced the object cache (don't worry about the extra metadata we're pulling using cfcache. We'll cover that later). The only real difference is that here we specify a name for our cache in both cfcache tags by defining it in the key attribute. Key allows us to specify a custom name for our cache. If a cache by that name hasn't been configured in your ehcache.xnl file, ColdFusion will automatically create it using the parameters set in the default cache settings. Remember, you can only work with custom caches using the cfcache tag. None of the cache functions in ColdFusion 9.0 allow you to specify the cache name you want to apply the function to. Go ahead and run the code a few times to verify it's pulling from the custom cache.

There's a lot more advanced stuff that can be configured in the ehcache.xml file such as cache clustering. These topics deserve their own posts, which we'll get to soon. For now, thanks for sticking with the series and I hope you've learned a little more about the ins and outs of cache configuration in ColdFusion 9.

In previous versions of ColdFusion (before ColdFusion 9 that is), there were three built-in mechanisms you could for caching – persistent variable scopes, query caching and the cfcache tag. As I mentioned in Part 2 of the series, each of these methods has inherent limitations and generally requires a good deal of additional programming to gain any semblance of control over the actual cache – if that is even possible. In many cases it really isn't practical or even possible to see how much information is stored in one of the aforementioned caching mechanisms. All of this changes with the introduction of Ehcache as the underlying cache provider in ColdFusion 9. Five parts into this series, and I just realized I've mentioned Ehcache several times but I never really took the time to talk about what exactly it is and how it's been implemented in ColdFusion 9.

The Ehcache project was started by a gentleman named Greg Luck. Ehcache can best be described as "... a widely used Java distributed cache for general purpose caching, Java EE, and lightweight containers." Caches are implemented as key-value stores, much like a ColdFusion structure. One important feature of an Ehcache cache is that it can be persisted to memory, disk, or both. Optionally, memory caches can be configured to survive JVM restarts (or in our case, ColdFusion server restarts). Caches can be configured via an XML configuration file named ehcache.xml or programmatically at runtime. Ehcache ships as a JAR file and runs in-process within an application server's JVM. In the case of ColdFusion 9, the JAR file and the XML file used to configure it (ehcache.xml) are both located in:

/JRun4/servers/server_name/cfusion-ear/cfusion-war/WEB-INF/cfusion/lib

ColdFusion 9 implements three types of caches using the Ehcache engine: Template Caches, Object caches and Hibernate Caches. Each of these cache types has their own use cases and we'll cover them in depth in future blog posts. For now, here's a quick summary of what each cache type is and generally what it's used for.

The template cache is designed for caching entire web pages as well as page fragments (sections of web pages). You shouldn't confuse this with the template cache mentioned in the ColdFusion Administrator. That template cache is concerned with compiled ColdFusion templates stored in memory. It's unfortunate that this dual use of the term "template cache" exists in ColdFusion so you need to be aware of which type of cache is being referred to when you see template cache mentioned. In terms of the Ehcache template cache implementation in ColdFusion 9, it's a fairly automatic process with ColdFusion handling the bulk of the work managing keys, cache gets/puts and expiry/eviction from the cache. Working with the template cache is done exclusively using the cfcache tag. Here's a very basic example:


<cfoutput>
I'm real-time dynamic data #now()# <br/>
</cfoutput>


<cfcache action="serverCache">
<cfoutput>
I'm cached dynamic data: #now()# <br/>
</cfoutput>
</cfcache>

<cfoutput>
I'm also real-time dynamic data #now()# <br/>
</cfoutput>

<cfcache action="serverCache">
<cfoutput>
I'm cached dynamic data too: #now()# <br/>
</cfoutput>
</cfcache>

In this code, the first section displays the current date/time. The second section uses the cfcache tag with action="serverCache". This action specifies that we want to use server side caching (Ehcache) and is now the default action in ColdFusion 9. To cache a fragment of code in a ColdFusion page we simply have to wrap it in a cfcache block. The next section of code is again entirely dynamic. The fourth and final section is wrapped in another set of cfcache tags. When you execute this code for the first time, the output will show the same date/time for all four sections of code, like this:


I'm real-time dynamic data {ts '2009-11-16 10:49:50'}
I'm cached dynamic data: {ts '2009-11-16 10:49:50'}
I'm also real-time dynamic data {ts '2009-11-16 10:49:50'}
I'm cached dynamic data too: {ts '2009-11-16 10:49:50'}

Running the code a few seconds later has different results:


I'm real-time dynamic data {ts '2009-11-16 10:51:16'}
I'm cached dynamic data: {ts '2009-11-16 10:49:50'}
I'm also real-time dynamic data {ts '2009-11-16 10:51:16'}
I'm cached dynamic data too: {ts '2009-11-16 10:49:50'}

In this case, the output contains both dynamic data (the first and third lines) as well as cached data (the second and fourth lines). Using the template cache it's easy to see how you can mix both dynamic data and multiple fragments of data that needs to be cached on the same page. There's really not a whole lot for you to do. You simply wrap the content you want to cache in a cfcache block and ColdFusion handles the rest. There are a few things you can control like cache timeouts, but we'll cover that in a full blog post dedicated to the template cache and all of it's features.

The object cache gives you much more granular control over what you cache than the template cache does. Using the object cache, you can pretty much cache anything you want – simple values, complex variables, objects, files, and just about anything else you want to throw at it. The advantage to using the object cache is that you have complete control over key names, get/put/remove operations, and cache expiry/eviction. This takes a little more work on your part but what you give up in convenience you easily gain back in flexibility and control. You can work with the object cache using both the cfcache tag as well as the new caching functions introduced in ColdFusion 9. Here's a very basic example of how you can cache the results of a query in an object cache using the cfcache tag:


<cfcache
    action="get"
    id="artistQuery"
    name="getArtists">


<!--- go to the cache. If the data is not there,
go to the db then repopulate the cache --->

<cfif isNull(getArtists)>

    <!--- call getter then setter to retrieve new
     value then update the cache --->

    <cfquery name="getArtists" datasource="cfartgallery">
        SELECT *
        from artists
    </cfquery>
    
    <cfcache
        action="put"
        id="artistQuery"
        value="#getArtists#"
        timespan="#createTimeSpan(0,0,1,0)#">

</cfif>

<h3>Query:</h3>
<cfdump var="#getArtists#">

In this code, the first thing we do is try to pull our query out of the cache using the cfcache tag with action="get". The key that identifies our cached query in the object cache is set using id="artistQuery". Of course the first time we run this the query won't be in the cache, so the result variable we specified (name="getArtists") will return Null. After we attempt to pull the query from the cache, we then use the isNull function to see if anything was returned from our get operation. If getArtists is Null, we know that the query data isn't in the cache so we need to run our query, retrieve the data and then place it in the cache using the cfcache tag with action="put". When we do our cache put, we set id="artistQuery", which is the name of our cache key. The value to store in the cache is our ColdFusion query (value="getArtists") and we can specify how long the data should be cached using the timespan argument. The first time you execute the page (with debugging turned on), you'll notice that there's debugging information for your executed SQL Queries – because the query data didn't exist in the cache yet and you told ColdFusion to go off and get it from the database and put it into the cache. If you execute the page again, you won't see any debug info for your SQL Queries because ColdFusion is pulling the data from the cache and not from the database. It's important to note that in the dump of the query data from this example, Cached will always be False, regardless of whether you are accessing cached data or not. This is because Cached refers to whether or not the result set is in the ColdFusion Query Cache, which it is not because we are using Coldfusion's Ehcache implementation here and not the built in query cache. There's a lot more you can do with the object cache which we'll cover in another blog post.

If you're using ColdFusion 9's new Hibernate ORM functionality you can configure the ORM such that it uses Hibernate as its second level cache. This is a more complicated topic that we have time to discuss now, so let's table it for another blog post.

When you create a ColdFusion application and your application uses an Application.cfc or Application.cfm file, ColdFusion automatically creates a template cache and an object cache for you. These caches are bound to your application name (defined in your cfapplication tag for an Application.cfm file or this.name if you are using Application.cfc). If you have an unnamed application, ColdFusion will create a cache that is shared and accessible among all unnamed applications. It's important to remember that caches are not tied to ColdFusion scopes. If your application times out, this does not affect the cache(s) you create with Ehcache.

That's about it for getting to know the basics of Ehcache in ColdFusion 9. In Part 6 we'll start looking at the ehcache.xml file and how it can be used to configure the behavior of the caches you create in ColdFusion 9.

So far in this series, we've covered why you would want to cache, what to cache and when, and basic caching architectures. In part 4 of this series, we're going to talk about caching strategies and eviction policies.

A caching strategy is nothing more than an architectural decision on how you're going to manage putting data in and retrieving data from your cache and the corresponding relationship between the cache and your backend data source. There are two main caching strategies you need to be aware of, deterministic and non-deterministic.

A non-deterministic caching strategy involves first looking in the cache for the object or data you want to retrieve. If it's there, your application uses the cached copy. If it's not there, you must then query the backend system for the object or data you want to retrieve. This is by far the most popular caching strategy as it's relatively simple to implement and is very flexible.

A deterministic caching strategy is one in which you always go to the cache for the object or data that you need. It's assumed that if it's not in the cache, then it doesn't exist. This strategy requires that your cache be pre-populated with data as there's no mechanism for a cache miss to query the backend system for the missing object or data.

Both deterministic and non-deterministic caching strategies have their pros and cons. For non-deterministic caching, the upside is that it's simple to implement in code and you have a lot of flexibility in how you do this. The downside to this caching strategy is an issue called stampeding requests, otherwise known as the dog pile. This occurs, usually under load, when a cache miss results in multiple threads simultaneously querying the backend system for the missing cache data. Under this scenario, it's very easy to overwhelm the backend system with requests as the database struggles to fetch the data and repopulate the cache. There are various ways that you can code around this, which we'll discuss later on in another blog post. For now, it's just important to realize that it can happen.

For the purposes of the rest of this discussion as well as the rest of the series, we'll be focusing on non-deterministic caching. That said; let's now turn our attention to cache eviction algorithms. Think of a cache like a box. A box has a limit on how much stuff it can hold before things start falling out when you try to pile on more. A cache is the same way when it comes to the objects and data you store in it – eventually it runs out of room.

Cache eviction policies can be broken down in to two categories: time based and cost based. Time based policies let you associate a time period or an expiration date for individual cache items. This lets you do things like keep an item in the cache for 6 hours, or 30 days, or until December 15, 2040 at 10:00pm. When a request is made to a cache that contains items with time based expirations, the cache first checks to see if the item is expired. If it is, the item is evicted from the cache and is not returned to the operation that called it (most caches simply return null).

Cost based eviction policies work a little differently. A cost based eviction policy doesn't kick in until a cache is full and needs to kick some items out (evict) before allowing new ones in. Most caches give you several cost based eviction policies to choose from. In this scenario, when you attempt to put a new item in the cache, the cache first looks to see if it's full. If it is, it runs whatever cost based eviction policy has been set for the cache and evicts the appropriate item(s). The following are some of the most common cost based eviction policies you'll encounter:

First In First Out (FIFO): The first item that was placed in the cache is the first item to be evicted when it becomes full. It's essential to remember that the first item in the cache is not necessarily the least important. If the first item in your cache is also the most frequently accessed item you might want to think twice about implementing an eviction policy that would result in evicting it from the cache first in the event the cache fills up.

Least Recently Used (LRU): This policy implements an algorithm to track which items in the cache are the least frequently accessed. Various cache providers implement this algorithm in different ways but the result is that the items in the cache that haven't been used in a while are evicted first.

Less Frequently Used (LFU): This algorithm is unique to Ehcahe. It uses a random sampling of items in the cache and picks the item with the lowest number of hits to evict. The Ehcache documentation claims that an element in the lowest quartile of use is evicted 99.99% of the time with this algorithm. In a cache that follows a Pareto distribution (20% of the items in the cache account for 80% of the requests) this algorithm may offer better performance than LRU. For more detailed discussion of various cache eviction algorithms, see the cache algorithms page on Wikipedia.

That's about it for this post on caching strategies and eviction policies. In Part 5 of this series, we'll finally start to take a look at caching in ColdFusion including what's always been there and what's new in ColdFusion 9.

A quick little plug: If you're heading to Adobe MAX 2009 in LA this October and want to know more about caching in ColdFusion 9, check out my session on Advanced ColdFusion Caching Strategies where I'll be covering a lot of what's already been discussed on my blog as well as a whole bunch of new material. I hope to see you there!

Welcome to Part 3 in my series on Caching Enhancements in ColdFusion 9. In Part 2, we talked about caching granularity. This time around, were going to spend some time discussing caching architectures. When talking about caching architectures, it's important to understand the type of cache being referred to. Basically, caches come in two flavors: in-process and out-of-process.

An in-process cache operates in the same process as its host application server. As I mentioned in Part 1 of this series, the new caching functionality in ColdFusion 9 is based on an implementation of Ehcache. Because Ehcache is an in-process caching provider that means that the cache operates in the same JVM as the ColdFusion server. The biggest advantage to an in-process cache is that it's lightning fast as data/object serialization is generally not required when writing to or reading from the cache. On the other side of the coin, in-process caches have limitations that you need to be aware of when it comes to system memory - particularly if you're on a 32-bit platform or a system that's light on RAM. On 32-bit systems, the JVM is typically limited to between 1.2GB and 2GB of RAM, depending on platform (although some 32-bit JVM's running on 64-bit systems may be able to use up to 4GB of RAM). Because you have to share this with your application server, that leaves considerably less RAM available to your cache.

In-process caches can be scaled up by adding more RAM, but not out by adding more servers as each cache is local to the application server's JVM it's deployed with. We'll discuss this in more depth when we talk about clustered caching. When using an in-process cache you always need to be aware of the number of items you'll be caching and how much RAM they take up to avoid a sudden spike in cache evictions if the available memory to both your application server and cache tops out. Fortunately for ColdFusion, Ehcache can be configured so that it fails over from RAM based storage to disk in the event that the cache fills up.

Out-of-process caches, like their name suggests, run outside of the same process as the application server. In the Java world, they run inside their own JVM. Out-of-process caches tend to be highly scalable on both 32-bit and 64-bit platforms as they scale both out and up. If you need to scale an out-of-process cache, you simply install more instances of the cache on any machines with spare RAM on your network. The main drawback to out-of-process caches is speed. Data and objects being written to and read from an out-of-process cache must be serialized and deserialized. Although the overhead for doing so is relatively small, it's still considerable enough to have an impact on performance.

Although Ehacahe itself is not an out-of-process cache, it does come with something called Ehcache Server which is available as a WAR file that can be run with most popular web containers or standalone. The Ehcache server has both SOAP and REST based web services API's for cache reads/writes. Another example of an out-of-process cache is the ever popular Memcached.

Now that we've covered the basics of in-process and out-of-process caches, it's time to make things a little more complicated by adding distributed caching and cache clustering to the mix. My experience over the last few years with caching has been that the term distributed tends to be a catch-all for what most would consider a true distributed cache as well as for a clustered cache. Confused yet? Let me attempt to clarify. Most of you are probably already familiar with how clustering works. In the application server world, you take an application server such as ColdFusion and you deploy it on two or more identically configured machines (or you can deploy multiple instances to one or more machines) which you then tie together through hardware and/or software. The result is that you are able to distribute load to your application across multiple servers which allows you to scale your application out. Need to be able to support more users? Add more servers to the cluster. It's the same for caching. If you have an in-process cache, you can't make the cache hold more items

When it comes to cache clustering, the primary reason for doing so is usually that you already have or are planning to deploy your application on a cluster. If you have a clustered application that needs to make use of caching, the first problem you face is that each application server has its own in-process cache which is local to the server. If Server A writes a piece of data to its in-process cache, that data is not available to Server B. This might not be a big deal for some clustered applications that implement sticky sessions, have light load or have data that doesn't necessarily need to be synchronized, but it becomes a serious problem for clusters that are configured for failover, have heavier load, or have cached data that needs to be in synch across every server in the cluster. In these instances, standalone in-process caching doesn't work well. The solution is to cluster your in-process caches as well as your application server. In the case of ColdFusion 9, the underlying Ehcache implementation fully supports caching. When configured, each local cache automatically replicates its content via RMI, JMS, JGroups, TerraCotta, or other plugable mechanisms to all other caches specified in the configuration. There's a small amount of latency while the data replicates but it's negligible in all but the most extreme use cases. I have set this up, tested, and verified it works with the ColdFusion 9 implementation. I'll put up a detailed post of exactly how to do this in a future blog post. The important thing to understand here is that clustering of in-process caches gets you redundancy, but the limit on the size of a single cache is still the limiting factor on scalability (e.g. if the cache you want to cluster has a limit of 500MB of data, clustering the cache between two servers means you are still limited to that 500MB of data in the cache, only now it's stored on two different servers).

Distributed caching differs from clustered caching in that a distributed cache is essentially one gigantic out-of-process cache spread across multiple machines. If you think of a clustered cache as comparable to a clustered application server then a distributed cache is much like a computing grid. Whereas a clustered cache gets you redundancy, a distributed cache gets you horizontal scalability with respect to how much data or how many objects can be put in the cache. Different distributed caching providers handle the exact caching mechanics differently, but the basics remain the same. If you need redundancy in a distributed cache, many distributed caching providers, including Ehcache Server let you cluster distributed cache nodes. The following diagram illustrates how a distributed, out-of-process cache cluster using Ehcache Server might look.

You should note that this is just one of many possible configurations. Using a combination of hardware and software it's possible to build out some pretty sophisticated caching architectures depending on your performance, scalability and redundancy requirements. It's even possible to create hybrid in-process/out-of-process architectures using solutions such as Terracotta.

That's about it for caching architectures. If you want to learn more, a fantastic resource is the website High Scalability. I hope you continue to find this series helpful. In Part 4 we'll cover our last foundation topic - the basics of caching strategies, before moving into ColdFusion 9's specific Ehcache implementation.

One ColdFusion 9 feature I haven't heard much buzz about but I think has the potential to really enhance high performance and large scale ColdFusion applications is caching. ColdFusion has always had caching capability, but more often than not they've been black boxed, giving the developer limited control and visibility over the process. All that changes in ColdFusion 9 with a major overhaul of the cfcache tag. The biggest single enhancement here is the implementation of the popular distributed caching provider Ehcache under the covers. What this means is that ColdFusion now implements one of the most popular and certainly one of the fastest caching mechanisms available for Java.

Before I get too deeply into configuration and code, I want to take a little time to talk about caching theory, strategy, and patterns. Ehcache changes the caching game in ColdFusion, and a lot of the knowledge we have as ColdFusion developers about caching is no longer relevant. Some of it in fact is just plain problematic, and I hope to shed some light on those issues and talk about how Ehcache helps solve those problems as well as gotchas to look out for when implementing large caching systems.

Just so that we're all on the same page, let's start with a definition of caching as found on Wikipedia:

"...a collection of data duplicating original values stored elsewhere or computed earlier, where the original data is expensive to fetch (owing to longer access time) or to compute, compared to the cost of reading the cache."

There's two important concepts here. First is that cached data is duplicate data. The second is that we're going to duplicate it where it would otherwise be expensive to compute it or fetch it relative to how quickly it can be grabbed from the cache. Keep these two things in mind as we continue through this post.

When a lot of people talk about caching, they talk about it in terms of performance. You may want to cache a particular web page because it's slow to load, or perhaps you want to cache the stats shown on a particular page because it takes a long time to run the query that crunches the numbers you're going to display. These are both valid cases where using cached data can speed up the performance of your application. What I find to be a more compelling use case, though, is caching for scalability. What I mean by caching for scalability is using cached data to reduce the load on critical resources such as the database, app server, web server, network, or client. At each of these phases there's an opportunity to use cached data to allow you to do more with less. What's really cool here is that a byproduct of caching for scalability tends to be increased application performance.

Let's look at an example involving the database. Say for example your database is capable of handling 100 requests per second. Now what if you need to be able to handle more requests? One option would be to throw more hardware at the problem - increase the amount of memory available to the server, add more processors, or maybe even add a 2nd or 3rd database server to cluster and distribute the load. That's certainly one option, but it's also expensive and potentially complicated to manage. A second option would be to cache the data you're requesting. Let's assume you're able to cache the data such that you achieve a hit ration of 90% (hitRatio = hits/(hits+misses)). That is, 9 out of every 10 requests for data go to the cache instead of to the database (certainly doable in most circumstances). What you've now gone ahead and done is effectively reduced your database load to 10 requests per second. This means that the same database with the addition of a cache is now able to scale by a factor of 10. That's a pretty significant increase in scalability.

That's it for Part 1 of this series. Stay tuned for Part 2 where I'll discuss what to cache and why. If you're planning to be at Adobe MAX 2009, stop by my session on Advanced ColdFusion Caching where I'll be talking about this as well as all of the great new Caching features in ColdFusion 9 in a lot more depth.




Copyright 1995-2010 Rob Brooks-Bilson. All rights reserved.
Aura skin for Raymond Camden's BlogCFC inspired by Joe Rinehart & Steven Erat. This blog is running version 5.9.004.