I've been spending some time recently looking into Search Engine Optimization for BlogCFC. I hadn't given this much thought until recently, when I launched a new blog and noticed that the majority of my posts weren't being indexed in Google, and the ones that were weren't being indexed as I thought they would be.

It turns out that Google is fairly picky on how it indexes sites, especially dynamic ones like those that use blogCFC as the engine for their blogs. I've been discussing with Ray Camden changes that can be made to blogCFC that will make it more search engine friendly, especially for Google. The changes that I'm proposing are all to the core CFC and related layout files and don't get into server side changes that you can make such as SES/friendly URLs. That's a topic all its own, and one beyond the simple changes I'm proposing. Ray's agreed to consider them for a future release.

The main issues as I see them are that Google and other search engines aren't indexing individual bog entries, those that you would associate with an entry's permalink. Instead, Google seems to be indexing entries wherever it finds them. This means that sometimes it will index a particular entry from your blog's homepage (undesirable since this changes and entries drop off), from calendar links, from category pages, etc. The point is Google isn't being provided with one single version of any of your blogCFC posts. Instead, it's left to its own devices to attempt to craw you site and index the content. This is problematic for the reasons I just mentioned. It's also problematic for a number of other reasons. One of these is the calendar pod. When Google encounters the calendar on a page, it attempts to follow all of the links. While this is a slight problem because a post can show up for day, month and year entries, it's an even bigger deal with the > and < links, which move the calendar backward and forward in time. It's a problem because Google can go on crawing these links backward and forward for what would be an infinite amount of time. Luckily, Google is coded such that it won't let itself get caught in this sort of endless indexing of dynamic content. The bad news here, though, is that when it realizes it's caught like this, it stops indexing your content. Google has stated on their site that while the engine will index dynamic content, it may limit how much it indexes on any given site. With this in mind, I'm of the opinion that it's better to code the calendar component so that you can't move forward or backward past the point of any entries. That is, you can't move back to a time before the first entry was made, and you can't move forward in time past the current month.

Here are the recommendations I'm making that should improve the search engine friendliness of blogCFC. If you have any additional suggestions/comments, please make them here and I'll be sure to compile them and pass them along to Ray:

  • Code the calendar so that the > and < controls are disabled when you have reached the month of the first entry or the last entry in the system. If you still want the buttons to be functional, but not allow search engines to follow the links after that point, consider adding rel="nofollow" to the A HREF tags for the controls.
  • Use the META Robots tag to specify to search engines how to treat a page. For the main page of your blog, use
    <meta name="robots" content="noindex,follow" >
    This tells crawlers not to index the content (since the page content changes regularly), but that it's ok to follow any links. You should also use this for pages generated by date, subject, etc. For the actual entry pages on your site, use
    <meta name="robots" content="index,follow" >
    This tells the crawlers to both index the content and to follow any links on the page. Remember, the main idea here is that you want the search engines to index each entry on your site individually. This will make it much easier on the search engines as well as visitors using the engines to locate content on your site.
  • Consider modifying blogCFC such that the titles of entries are actually links to the individual entry (permalink). Right now, blogCFC has a link at the bottom of each entry called "link" that provides the permanent link for the entry. This is how Blogger typically does it, but I've read in several places that it's better to link the title of the entry.
  • This is obvious, and was added in a previous blogCFC update, but make sure that the actual title of the blog entry appears in the HTML TITLE tag for the entry as this is what gets displayed in a search result link.

I'm sure there's a lot more that can be done, but this should serve as a good starting point for making blogCFC more search engine friendly.

Comments (Comment Moderation is enabled. Your comment will not appear until approved.)
Ryan Stewart's Gravatar Very interesting, I'm glad you talked to Ray about it because for those of us who aren't on the most popular feeds, Google is one of the few ways to drive traffic.
# Posted By Ryan Stewart | 6/22/05 10:08 PM
Leif's Gravatar This is great, Rob. I have been wondering about BlogCFC and search engines for some time. I hope that they can easily be implemented by Ray and anyone who can assist him! Thanks!
# Posted By Leif | 6/22/05 10:22 PM
johnb's Gravatar Whilst i use parts of Ray's blogCFC app (www.beynon.org.uk) - I got rid of the calendar, I've never liked them, entry titles are links. If you do a site: search on google against my site you get about 154 results whilsts against yours and Ray's you get significantly less results, most of which look like links followed from the calendar! You've definitely highlighed some valid points...
# Posted By johnb | 6/23/05 3:23 AM
Rob Brooks-Bilson's Gravatar Hi John,

This is along the lines of what I noticed as well - that only some of my posts were making it into google, and the ones that were weren't coming from the individual entries but rather what seemed like random calendar pages.

Given the number of people using blogCFC these days, I think getting these issues cleared up is going to result in a lot more traffic for everyone, not to mention a lot more CF related content in google.
# Posted By Rob Brooks-Bilson | 6/23/05 10:47 AM
Roger Lancefield's Gravatar I also found your article very interesting Rob, not least because I'm currently pondering the very same issues (my blog is powered by ColdFusion but I'm using my own Fusebox-based blog app rather than Ray's BlogCFC). Like you, I've been pondering what best to mark as indexible, followable and archivable. I've been wondering about the calendar problem and your comments have given me food for thought, so many thanks for that.

One thing though, you mention marking the blog main page as non-indexible. Isn't this going to put many blogs in the position of having an un-listed home/main page? My gut feeling is that provided the main page is clearly indicated as being such within a SE listing, then visitors generally understand and are tolerant of any discrepencies between the actual content and that listed within the search engine results. This issue seems to be a common problem and applies (of course) to many news sites and portals, as well as blogs.

Having said that, my gut feeling applies mainly to people who understand how Web apps and indexing work. The less experienced may well find such discrepencies baffling. But then, what to do about the main page?
# Posted By Roger Lancefield | 6/29/05 1:00 PM
Rob Brooks-Bilson's Gravatar Hi Roger,

I had the same thought about whether or not to make the main blog page indexible. In the end, I was worried that if google indexed the main page, it would only index that version of those entries, creating some gaps. I'm not sure for sure, though as I don't really know how Google fully treats this. It will be interesting to see how blogs implementing this show up in google after they are indexed next. That's part of the reason Ray is keeping some of this stuff in beta, so we get a chance to test and tweak before becoming a final release.

I'd love to hear any additional thoughts you have here.
# Posted By Rob Brooks-Bilson | 6/30/05 9:28 AM



Copyright 1995-2010 Rob Brooks-Bilson. All rights reserved.
Aura skin for Raymond Camden's BlogCFC inspired by Joe Rinehart & Steven Erat. This blog is running version 5.9.004.