Showing posts with label text. Show all posts
Showing posts with label text. Show all posts

Monday, August 11, 2008

"Brevity Is The Soul Of Wit"..... Not According To Google!

(Reproduced from The Furtive Penguin)

An Experiment With 'Code to Text' Ratios


I have only recently begun to initiate myself into the mysteries of Search Engine Optimization. We all know that Inbound Links from highly ranked sites are the main determinants of Page Rank. Keywords continue to play a role with some of the minor search engines. I have been told that all but 'Teoma' and 'AllTheWeb' disregard the keywords metatag now. But what about code to text ratio? This is often overlooked and I wondered if it might help to explain one or two anomalies.

"Code to text ratio' is exactly what you might expect - its a comparison of the quantity of code and text on a given page expressed as a ratio. I am reliably informed that it plays a role in the Google and Yahoo page-ranking algorithms. Of course no one really knows precisely how these algorithms work ( except Google and Yahoo ) and they are constantly changing anyway. But according a higher rank to pages which have a lot of text in them does seem logical. The search engines presumably want to prioritize content-rich pages that are informative or useful to their visitors. A page that has nothing but links on it will have a poor 'code to text ratio' because the code involved will outweigh the text.

Does anyone need to be concerned about this? Well, if youre a blogger...probably not. The code to text ratio of the average blog is fairly high, typically above 30 percent. Furtive Penguin weighs in at about 32.98% . This is a consequence of the fact that much of the code needed to generate a blog does not appear on the index page. If, however, you are serving up static html, it is a different matter.

I have a site called 'Americymru' ( pronounced amerikumree ) which is a Welsh American Heritage Site. It has a Page Rank of 1. Or at least some of its pages do. The 'Index' and the 'News' pages for instance. On this site there is a calendar called "This Day In Welsh History" which, for obvious reason, has twelve pages ( here is a sample ). None of them have Page Rank and none of them have any external links that I am aware of. In my opinion these pages offer more of value to the target audience than much of the rest of the site. When I performed a 'code to text ratio' analysis on the 'calendar' pages ( tools for this can be found here and elsewhere on the web ) they scored a miserable average of 4.1%.

So!! I have included a block of text at the bottom of the pages ( some generic and some page specific ) which has boosted the code/text ratio to 15-20%. I am now eagerly awaiting the googlebot's next visit. Will an improved code/text ratio be enough in this case to increase Page Rank and bring the 'calendar' into line with the rest of the site?

An interesting experiment...or at least I think so! But then I am incorrigible.

As a side note one wonders how all this impacts sites which are constructed entirely out of image files. Unless one does some fancy footwork and includes an overlay with the text in a form which can be read by the bots, such pages will appear to be devoid of content. Doing things this way also solves the issue of development of content which is comlpiant with US standards for the disabled. The text overlay would, of course, be readable by a text reader.




Sunday, August 10, 2008

D.I.Y Apps Part 5 Text Substitution with RPL

( Reproduced from The Furtive Penguin )

Script here

Recursive text substitution in multiple files is not a task that the average end user is called upon to perform very often. But lets suppose that you have a couple of web sites either with a hosting company or on your own server. Let us suppose further, that you want to change the mailto link address on every page on your site. Not a problem if you only have 5 or 6 pages but what if you have five or six hundred? Clearly, in the absence of an automated text replacement utility, you are going to be spending a lot of quality time with the WISYWIG editor of your choice.

Of course you could always employ the venerable 'sed' command with 'find' and 'exec' but that has limitations and the syntax is possibly the most bizarre and grotesque construction in the whole of Unix! Here is an example:-

find ./path/to/directory -type f -exec sed -i 's/oldtext/newtext/' {} \;

Enter 'rpl'!! The program was written for Debian as a free replacement for the non-free rpl program by Joe Laffey which can be found here. Rpl defines its function in the following terms (from the manual):-

"Basic usage is to specify two strings and one or more filenames or directories on the command line.The first string is the string to replace,and the second string is the replacement string."

One of the joys of 'rpl' is that it will replace text recursively by simply specifying the -R option. If you are running Ubuntu/Debian 'rpl' is available from the repositories. It is of course a command line tool but the man page is amongst the most intelligible and comprehensible that I have ever read.

In keeping with the spirit of this series of articles I could not resist writing a 'Dialog' front end for the 'rpl' program which allows the user to deploy some of its most useful functionality from the GUI. Here is the help file included with the script:-

OPTION 1. prints this help file - OPTION 2. will replace all instances of a text string with a new string in a given file - OPTION 3. will replace all instances of a text string with a new string in all files in a given directory. - OPTION 4. will replace all instances of a text string with a new string in all files in a given directory and all its sub-directories. WORKS WITH TEXT AND HTML FILES ONLY! You will need to enter the full path to all files and folders. This front end script should work equally well for single and multiple word substitutions . RPL is a command line program and it is capable of much more than this. In order to acquaint yourself with the full range of its capabilities consult the manual - man rpl. Enjoy!

As you can see the script allows you to replace text in a single file; in a group of files in a directory or in an entire directory tree. Having access to a tool like this can save hours of arduous labour with an HTML editor. In order to make this work you will need to install 'dialog' and 'rpl'. They are both in the Debian/Ubuntu repositories. I have tested this fairly extensively and it seems to work OK. if you find otherwise please let me know so that I can fix it. Enjoy!


Script here







Click to Enlarge