This morning, comscore released the November 2011 search engine market share and rankings. The rankings indicate that Bing and AOL gained and Yahoo and Google lost.
The change as compared to October is not very significant.
Here is the search engine market share as compared to October 2011:
- Google: 65.4% (-0.2)
- Yahoo: 15.1% (-0.1)
- Bing: 15.0% (+0.2)
- Ask: 2.9% (0.0)
- AOL: 1.6% (+0.1)
No search engine gained or lost much and the rankings have become quite stable now.
Art J. Adams
Google will now have a monthly post on its blog about the major algorithm changes that it did in that month. This monthly series will highlight all major changes that Google made to its search ranking algorithms as well as indexing updates, snippets and other aspects of its search engine. On December 1st of 2011, Scott Huffman of Google posted on the changes Google made to its search algorithms. I want to summarize the changes and discuss how these changes would affect websites. Here are the changes with some comments:
- Related Queries and Semantic Search - Scott notes that sometimes, when you type in a query, the results that show up are not only the ones for the original query but also results of other related queries. See this patent in which Google describes how related queries could be used for ranking. This just makes the search a bit more semantic rather than matching only on the words in the original query. The change made in November of 2011 will continue doing this, but will discourage dropping of a rare word in the related queries. For example, if the original query is “vinci paintings”, and the related query is “paintings”, Google will not allow the related query to be used for ranking. Rare words are words with a low inverse document frequency. So, this implies that rare words will be given more importance than before in ranking.
- Deeper Crawling - Deeper crawling and making long tail documents more available. This is really good news in that Google will crawl more aggressively and make its indexing algorithms less selective.
- Parked Domains – With a new parked domains classifier, Google will actively detect parked domains and demote them in search results. This might mean lesser search traffic for parked domains. So unless your domain name has a lot of type in traffic (i.e. the visitor arrives at your site by just typing the domain name in a browser), your parked domains will recieve less traffic.
- Aggressive Auto-Complete – Auto-Complete in Google instant will now be more aggressive, and the number of suggestions that a searcher receives will be more than before.
- Blog Search - More indexing coverage and fresher results for blogs!
- Original and Duplicate Content – More intelligent detection of original content. Google will now be able to tell which of the two very similar pages is the original, with a higher precision.
- Host Crowding – If in the top search results, there are several results from the same domain name, Google will demote or remove some of these results thereby introducing more diversity in the top search results.
I am delighted that Google decided to introduce this level of transparency it its algorithms! I will closely follow this series of posts and discuss them on my blog.
Art J. Adams
Stanford professors Jurafsky and Manning are teaching a free course on natural langauge processing, and it starts January 23rd 2012. I think it is a must for anyone interested in the scientific aspects of search engines and also SEO. Anyone can join the course, view lectures, submit assignments and participate in the discussion forum. It is a great opportunity to learn about natural language processing, though I don’t know how they will grade all those assignments!
I went through the course syllabus and it looks really good. You will learn about several interesting topics, and all of them are very interesting for the search engine enthusiast:
- Tokenization – How to tokenize sentences into words? This is not a trivial problem (except maybe for English) because there are languages that allow multiple words in a single token and the words are not seperated by white space. This is sometimes referred to as word segmentation.
- Text Classification – How to assign a topic to an article? This is of interest to Google and other search engines in many ways – summarizing an article, showing relevant ads, retrieving documents that don’t match on all the query words and such. Text classification is also used in infering user interests for personalization. See this article I wrote a few days back on how Google coud infer interests of users to show ads and to generate textual ranking signals for search.
- Spelling Correction
- Sentiment Analysis -This has many forms, and the simplest one is to detect a negative versus a positive sentiment in an article. A site with many bad reviews on forums and discussion groups might not be what a search engine user wants.
- Parsing – Language parsing is a way to create parse trees from sentences. Parse trees contain things like noun phrases, verb phrases and such. Though this is not directly related to search engines, it is an interesting topic to know about.
- Language Modeling – One of the simplest and the most useful topic in NLP. Language modeling aims to quantify the uncertainty in language by trying to assign a probability of seeing a word after a set of words.
There are also other courses that start on the same date. See the NLP course page for more details.
Art J. Adams