This morning, comscore released the November 2011 search engine market share and rankings. The rankings indicate that Bing and AOL gained and Yahoo and Google lost.

The change as compared to October is not very significant.

Here is the search engine market share as compared to October 2011:

  1. Google: 65.4% (-0.2)
  2. Yahoo: 15.1% (-0.1)
  3. Bing: 15.0% (+0.2)
  4. Ask: 2.9% (0.0)
  5. AOL: 1.6% (+0.1)

No search engine gained or lost much and the rankings have become quite stable now.

Art J. Adams

 

 

{ 0 comments }

Google 2011 Search Algorithm Changes

by ringostarr on December 4, 2011

in Search Engines

 

Google will now have a monthly post on its blog about the major algorithm changes that it did in that month. This monthly series will highlight all major changes that Google made to its search ranking algorithms as well as indexing updates, snippets and other aspects of its search engine. On December 1st of 2011, Scott Huffman of Google posted on the changes Google made to its search algorithms. I want to summarize the changes and discuss how these changes would affect websites. Here are the changes with some comments:

  1. Related Queries and Semantic Search - Scott notes that sometimes, when you type in a query, the results that show up are not only the ones for the original query but also results of other related queries. See this patent in which Google describes how related queries could be used for ranking. This just makes the search a bit more semantic rather than matching only on the words in the original query. The change made in November of 2011 will continue doing this, but will discourage dropping of a rare word in the related queries. For example, if the original query is “vinci paintings”, and the related query is “paintings”, Google will not allow the related query to be used for ranking. Rare words are words with a low inverse document frequency. So, this implies that rare words will be given more importance than before in ranking.
  2. Deeper Crawling - Deeper crawling and making long tail documents more available. This is really good news in that Google will crawl more aggressively and make its indexing algorithms less selective.
  3. Parked Domains – With a new parked domains classifier, Google will actively detect parked domains and demote them in search results. This might mean lesser search traffic for parked domains. So unless your domain name has a lot of type in traffic (i.e. the visitor arrives at your site by just typing the domain name in a browser), your parked domains will recieve less traffic.
  4. Aggressive Auto-Complete – Auto-Complete in Google instant will now be more aggressive, and the number of suggestions that a searcher receives will be more than before.
  5. Blog Search - More indexing coverage and fresher results for blogs!
  6. Original and Duplicate Content – More intelligent detection of original content. Google will now be able to tell which of the two very similar pages is the original, with a higher precision.
  7. Host Crowding – If in the top search results, there are several results from the same domain name, Google will demote or remove some of these results thereby introducing more diversity in the top search results.

I am delighted that Google decided to introduce this level of transparency it its algorithms! I will closely follow this series of posts and discuss them on my blog.

Art J. Adams

 

{ 1 comment }

 

Stanford professors Jurafsky and Manning are teaching a free course on natural langauge processing, and it starts January 23rd 2012. I think it is a must for anyone interested in the scientific aspects of search engines and also SEO. Anyone can join the course, view lectures, submit assignments and participate in the discussion forum. It is a great opportunity to learn about natural language processing, though I don’t know how they will grade all those assignments! :)

I went through the course syllabus and it looks really good. You will learn about several interesting topics, and all of them are very interesting for the search engine enthusiast:

  • Tokenization – How to tokenize sentences into words? This is not a trivial problem (except maybe for English) because there are languages that allow multiple words in a single token and the words are not seperated by white space. This is sometimes referred to as word segmentation.
  • Text Classification – How to assign a topic to an article? This is of interest to Google and other search engines in many ways – summarizing an article, showing relevant ads, retrieving documents that don’t match on all the query words and such. Text classification is also used in infering user interests for personalization. See this article I wrote a few days back on how Google coud infer interests of users to show ads and to generate textual ranking signals for search.
  • Spelling Correction
  • Sentiment Analysis -This has many forms, and the simplest one is to detect a negative versus a positive sentiment in an article. A site with many bad reviews on forums and discussion groups might not be what a search engine user wants.
  • Parsing – Language parsing is a way to create parse trees from sentences. Parse trees contain things like noun phrases, verb phrases and such. Though this is not directly related to search engines, it is an interesting topic to know about.
  • Language Modeling – One of the simplest and the most useful topic in NLP. Language modeling aims to quantify the uncertainty in language by trying to assign a probability of seeing a word after a set of words.

There are also other courses that start on the same date. See the NLP course page for more details.

Art J. Adams

 

{ 0 comments }

 

As compared to Facebook, Twitter is a very different social network in that it allows one way relationships (parasocial) to be established. A user can follow another Twitter user without being followed back. Facebook allows only two way relationships (reciprocal). Google+, the new social network from Google allows both one and two way relationships.

The question that most Twitter users ask is how to get more followers? Writers have proposed that the ideal followers to following ratio should be about 1. Most users expect that they should be followed back, except in the cases of celebrities who have a very high followers to following ratio. If a user follows thousands of users without being followed back, it hurts the user’s reputation. So what factors indicate that if a user follows another user, he will be followed back?

A recent paper at CIKM 2011 from Cornell University and Tsinghua University tries to answer this question by trying to build a model that can predict whether a relationship will be reciprocal in nature. The authors report that the model is able to attain an accuracy of 90%.

Let’s deep dive into the paper.

The paper describes the following factors that affect the likelihood of a reciprocal relationship as compared to a parasocial relationship:

  • Geographic distance and Time Zone: The probability of a reciprocal relationship is 50 times higher for two users in the same time zone as compared to two users who are at a distance of three time zones.
  • Homophily: Users with similar characteristics tend to follow each other and establish a reciprocal relationship. These characteristics include things like age, occupation, social status and so on. This is quite obvious in the case of Twitter. You want to follow someone with similar interests. The authors note that elite users like celebrities are 8 times more likely to establish a reciprocal relationship as compared to oridinary users.
  • Triadic Closure: Users who share common links like followers or followees have a much stronger tendancy to follow each other.
  • Retweets and Mentions: A user who retweets or mentions another user is three times more likely to be followed back.
  • Structural Balance: The authors state that the network of two way relationships is much more balanced than the network of one way relationships. The word balance refers to the structural balance theory which states that a network is balanced if for any three users, all the three are friends or only one pair of them are friends. This balance or stability is a property of social networks. According to the theory, the structure indicated by “the friend of my enemy is my enemy” signifies a stronger balance as compared to the structure indicated by “the friend of my friend is my enemy“. This is obvious, but it shows the validity of the theory.

The authors focussed on Twitter users, but these factors are not just related to Twitter, but to the broader field of social networking. Note that the triadic closure and the structural balance theories hold only for reciprocal relationships, and not for one way or parasocial relationships.

Art J. Adams

 

{ 0 comments }

New Blog Search Engine – BlogNoon – Concepts, Facets, Semantics

November 13, 2011

 A recent paper published by Google ISPRAS describes a new semantic blog search engine called BlogNoon. After using it a few times, I am impressed with the ability of BlogNoon to search for blogs using concepts and facets. The right navigation bar on its interface displays semantically related concepts about the query being searched for. [...]

Read the full article →

Comscore October 2011 Search Engine Rankings

November 11, 2011

 ComScore released the October 2011 search engine rankings today. The verdict for October is in! Bing and Google gained slightly at the expense of Yahoo and Ask. Bing gained 0.1 points, Google gained 0.3 points, Yahoo lost 0.3 points and Ask lost 0.3 points as compared to September 2011. Here are the numbers for October 2011 [...]

Read the full article →

SEO: How Google+, Facebook and Twitter are changing Search Ranking

November 10, 2011

 Information on social networks is valued by both Bing and Google. A +1, share or a like on Google+ and Facebook are votes for a site. In addition to promoting a site on social networks, these votes also have a potential to make your site rank higher for users in your circles or friends’ list. [...]

Read the full article →

SocialBots – A new threat to Facebook Privacy

November 8, 2011

 Researchers at the University of British Columbia in Canada have created SocialBots, a social botnet that mimics human behavior on social networks like Facebook to gain access to personal information, like email addresses of compromised users. An experiment run by UBC to infiltrate Facebook created a botnet of 102 social bots that created fake user profiles on [...]

Read the full article →

Ambiguous Search Engine Queries, Programmable Search Engines and a Patent

November 5, 2011

 Query ambiguity is a very important and an interesting problem that search engines face. On an average, search engine queries are 2.4 words long, which, in most cases, does not provide enough context to understand the searcher intent. Also, most queries are not even grammatically correct sentences or phrases, which prevents search engines from doing any [...]

Read the full article →

How does Google detect fresh content

November 5, 2011

 A recent change to the Google’s algorithm affects 35% of searchers, and is aimed at providing fresh and new content to the users. This change was rolled out on 3rd November 2011, supported by its new Caffeine indexing system. Fresh results are important for many queries, like news queries, or queries that relate to current [...]

Read the full article →