Google indexing algorithm is no longer a secret
Posted in: search engines, wikipedia, Google
I’m pretty sure now 90% of how Google ranks websites is known by many website designers or website promotion companies.
I’m a web developer and I’ve been through all the web evolutions since basic HTML to modern AJAX design. When you create a new website, the easiest part is the coding. Anyone can learn HTML, CSS, AJAX, PHP or MySQL with of course a minimum knowledge of programming. But you find so many examples, scripts and library that this is no longer a big obstacle.
The most tricky aspect of web designing is how to make your fabulous website or application famous and share it with the Internet world. Because everybody will agree that having the most efficient website, a genius idea or even the most beautiful website won’t be make it a success if you’re not able to tell the world about it. And you’ll end up frustrated and even worst, a big company could steal your idea and bring it to the top.
It’s difficult but there is ways to promote your work. You could advert, talk about it in forums, submit your story to social networking websites or pay a marketing “buzz” company if you’re rich. But every web designer know that one main source of traffic are search engines and especially Google (average is around 30% for a average website). You have to get to the top 5 of the top page of the search results using keywords related to your website. Don’t forget that people don’t look further than the 5 first results.
It cannot be denied that some designers know a lot better than others how to get to the top of Google search results. This is contrary to the basic principles of a search engine which goal is to index the whole world wide web. The problem is that it is quite often too easy to understand some obvious indexing rules :
Let me take one very simple example. Here is a random list of keywords I typed and the first results that was proposed to me (you can try yours):
“key” –> 1) http://www.key.com, (…) , 4) wikipedia
“eat” –> http://www.eat.co.uk
“cat” –> 1) wikipedia, 2) http://www.cat.org.uk
“hat” –> 1) wikipedia, 2) http://www.hat.com
“friend” –> 1) friendfinder.com 2) wikipedia
The first thing that comes to mind is that the domain name has a huge importance. The thing is when you say “cat” for instance you naturally first think of the pet but google prefers to give a domain “www.cat.” even if it refers to a website which content is far from what a human thinks when it hears the word “cat”. For me this is really annoying as the content quality of the results shouldn’t be base on what is in the URL. Or maybe the weight of this criteria should be a lot reduced. I think in a real world, you should get only results that have not the world “cat” in the URL as probably the most interesting contents about cats are not linked to with “Google Friendly” URLs that have the word in it.
One thing to solve this issue could be to propose different meanings for the main word of the search string (Google needs of course to identify the main word in the keywords or sentence) before giving results. That way the user would be redirected towards something that is a lot closer to what he was looking for.
The second think to highlight is that Wikipedia comes in the top list anytime the search word(s) have a direct article in the Encyclopedia. That’s a bit scary for Google because searching google is becoming more and more like searching Wikipedia in fact. By chance Wikipedia has a really bad search engine. Google can still be it’s search engine. Sometimes I found myself searching directly for something in Wikipedia through Google. The best search would be in fact a query like this in Google :
bla bla blabla site:wikipedia.org
That way you are pretty sure to always get complete and interesting articles about exactly what you were looking for (Even if you still get articles containing the word in the URL but in that case it makes more sense).
I know there are lots of more criteria (links, popularity, pagerank, links text, meta data, keywords in the page, etc.) but I wanted to pinpoint something really simple to demonstrate. The bad thing is that Google is now polluted with websites that have understood very well the way it indexes website pages. So I really wonder if Google is really indexing the worldwide web or just the tiny part that has understood the way Google “thinks”.
If I were Google, I wouldn’t have revealed the pagerank of the websites. I wouldn’t have also allow webmasters to know how the engine works by providing a way to get the sites that link to one website or to get those that have content related to a website. This has given lots of clues to website designers. Big parts of the Google “secrets” is already known by more and more people.
On the other side of the coin, you can still find websites that reach the top results with a really poor amateur website. That proves that they are still secrets to discover or maybe it just confirms that the engine does not update so easily. How would you code a robot to update more than 8,168,684,336 pages as quickly as possible and to add new web pages at the same time ?
Social Web