Monday, August 11, 2008

"Brevity Is The Soul Of Wit"..... Not According To Google!

(Reproduced from The Furtive Penguin)

An Experiment With 'Code to Text' Ratios


I have only recently begun to initiate myself into the mysteries of Search Engine Optimization. We all know that Inbound Links from highly ranked sites are the main determinants of Page Rank. Keywords continue to play a role with some of the minor search engines. I have been told that all but 'Teoma' and 'AllTheWeb' disregard the keywords metatag now. But what about code to text ratio? This is often overlooked and I wondered if it might help to explain one or two anomalies.

"Code to text ratio' is exactly what you might expect - its a comparison of the quantity of code and text on a given page expressed as a ratio. I am reliably informed that it plays a role in the Google and Yahoo page-ranking algorithms. Of course no one really knows precisely how these algorithms work ( except Google and Yahoo ) and they are constantly changing anyway. But according a higher rank to pages which have a lot of text in them does seem logical. The search engines presumably want to prioritize content-rich pages that are informative or useful to their visitors. A page that has nothing but links on it will have a poor 'code to text ratio' because the code involved will outweigh the text.

Does anyone need to be concerned about this? Well, if youre a blogger...probably not. The code to text ratio of the average blog is fairly high, typically above 30 percent. Furtive Penguin weighs in at about 32.98% . This is a consequence of the fact that much of the code needed to generate a blog does not appear on the index page. If, however, you are serving up static html, it is a different matter.

I have a site called 'Americymru' ( pronounced amerikumree ) which is a Welsh American Heritage Site. It has a Page Rank of 1. Or at least some of its pages do. The 'Index' and the 'News' pages for instance. On this site there is a calendar called "This Day In Welsh History" which, for obvious reason, has twelve pages ( here is a sample ). None of them have Page Rank and none of them have any external links that I am aware of. In my opinion these pages offer more of value to the target audience than much of the rest of the site. When I performed a 'code to text ratio' analysis on the 'calendar' pages ( tools for this can be found here and elsewhere on the web ) they scored a miserable average of 4.1%.

So!! I have included a block of text at the bottom of the pages ( some generic and some page specific ) which has boosted the code/text ratio to 15-20%. I am now eagerly awaiting the googlebot's next visit. Will an improved code/text ratio be enough in this case to increase Page Rank and bring the 'calendar' into line with the rest of the site?

An interesting experiment...or at least I think so! But then I am incorrigible.

As a side note one wonders how all this impacts sites which are constructed entirely out of image files. Unless one does some fancy footwork and includes an overlay with the text in a form which can be read by the bots, such pages will appear to be devoid of content. Doing things this way also solves the issue of development of content which is comlpiant with US standards for the disabled. The text overlay would, of course, be readable by a text reader.




No comments: