The BBC published an article last week describing the changes in the old Whitehouse.gov’s robots.txt file vs. the new administration’s robots file. A simple glance would indicate that the new administration, in a push for higher perceived transparency, removed a couple thousand lines of code that were blocking the search engines from indexing all of the content on the site. Removal of this code may seem to open up all the dirty secrets that the Bush camp were trying to hide but in reality it may be adversely affecting the ability of the site to rank properly.
After a bit more crawling through the file that was in place, it wasn’t doing anything nefarious, but merely reducing duplicate content issues. For those that don’t know, when the search engines see two pages with the same content on them it makes an executive decision (Chief of Staff Joke) and decides which of these pages are more important and will likely only show it in results. The original robots.txt file was merely telling the engines that there is more than one copy of a page on this site and we would sure appreciate it if you served this version because it has a prettier page title for instance.
Apparently to counteract having the search engines crawl the entire site, like it now will with a two line robots.txt file, the new webmasters have deleted several pages to nudge the engines to serve the appropriate version of the page. These deletions have taken place without proper redirects which would have passed solid link juice to the preferred page. This move has again damaged the ability of the site to rank.
Now, I’m not commenting on the overall transparency of either organization nor am I taking sides here, (I know better than to play that game) but it appears that the BBC may have wanted to consult with an SEO before going to print.