Saturday, September 6, 2008

How NOT To Verify Googlebot

Maintaining a blog takes a lot more effort than one would imagine. Besides family, lifestyle and work, it's hard to understand how people manage to find the time to keep their blogs up to date. I suppose the trick is to find a way to bring blogging under work or lifestyle so that blogging becomes something you do without much ado. Definitely something I need to work on.




Type "verify googlebot" into a major search engine and you will come across many articles telling you how to verify googlebot and other search engines using what I will call the reverse lookup verification method. This method is recommended by all the major search engines bar none. Google, Yahoo!, Microsoft, Ask and many internet experts recommend it. So it may come as a suprise to some people to read my article where I strongly recommended that this method not be used.

I feel that webmasters who use this verification method may not fully understand the implications of their action, so I'm going to try to explain the issue once again. You see, when a suspected search engine hits your website, you come into possession of an ip address that you know absolutely nothing about. Using this ip address, you are supposed to do a reverse dns lookup to get a list of hostnames to verify. If you are lucky and the ip address happens to belong to Google or Yahoo!, their server will respond to your dns request with a nice list of hostnames for you to verify. However, if that ip address happens to belong to the bad guys, you will be naive to expect them to play by the rules.

Now, in a perfect world the bad guys may respond to your dns requests with a nice list of names for you to verify. But in the real world they are more likely to dig out their playbook and launch every dirty trick they possess against your web server. If you have been following internet security closely, you must have heard of recent dns attacks where the bad guys needed to find a way to make you initiate dns requests. You know what? When you send a dns query to their servers on your own initiative, you have saved them the trouble of needing to trick you in the first place!! All they have to do now is jump straight into their attacks against your server.

In fact, if this method of verification becomes hugely popular, you can expect the bad guys to make good use of it to attempt to compromise even more web servers. It really is remarkably simple: all they need to do is send a request to your web server and wait for your dns requests to come in to launch their attacks. We all love to rave and rant about how insecure dns is, but using that insecure dns to send automated requests to random servers from your production web servers takes the insecurity to a whole new level. JUST DON'T DO IT no matter what the big search engines tell you.Try using one of the following solutions instead.

SOLUTION #1: USE ISOLATED VERIFICATION SERVER


In this solution, any suspected search engine that hits your website will be verified against a locally maintained database on your network without sending any reverse dns queries from your web server. Then the ip address will be submitted to a separate and isolated verification server along with any other information from the request headers that you care about. The verification server should do the reverse lookup verification and update the shared database on your network.

This server should also do other kinds of verification just in case the ip address does belong to a major search engine but the search engine company failed to setup reverse dns for the ip address. The idea here is to isolate the verification server as much as you can from your production web servers so that even if the verification server is attacked or compromised, the attacks will not immediately affect your production servers.

Of course, if you are not in a position to maintain a verification server yourself, you can do the smart thing by signing up for a regularly updated free database like I offer at botslist.com.

SOLUTION #2: USE FORWARD VERIFICATION EXCLUSIVELY


If you insist on doing the verification on your production web servers instead of using an isolated server or signing up for a third-party database, then consider using the forward lookup verification method exclusively. I have described this verification method elsewhere. I will only say here that this method is much faster and more secure than the reverse lookup method that everyone else is promoting.

Also, when I first described this method over a year ago, botslist.com database contained only one or two records of servers that passed the forward lookup test. Today, however, I'm very pleased to count twelve matching records in the database as you can check from this link http://www.botslist.com/search?name=x\-verifiedmeth:%20uahome. The interesting thing here is that because the small search engines pass the forward lookup test, verifying them is much faster and safer than verifying the big search engines!

I think we can expect that many smaller search engines will continue to take the initiative on this issue while the big search engines will continue to do their thing. Perhaps until a few high profile websites using the reverse lookup method are compromised using that method as an attack vector. To be forewarned is to be forearmed, like they say.