Friday, April 28, 2006

Google Page Rank Calculation

Note the PageRank equation (sans filters) is:

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) .

The first observation about this equation is that it can only be calculated after a statistically significant number of iterations.

If you analyze a site with 5 pages that all link to each other (the homepage having an initial PageRank of roughly 3.5), what you see in the first iteration of PageRank is that the homepage is PR 3.5, and all other pages are PR .365 – the largest PR gap that will ever exist through multiple iterations in this example.

This homepage PR represents a surge in PR because Google has not yet calculated PR distribution, therefore the homepage has an artificial and temporary inflation of PR (which explains the sudden and transient PR surge and hence SERPs).

In the second iteration, the homepage goes down to PR 1.4 (a drop of over 50%!), and the secondary pages get lifted to .9, explaining the disappearing effect of “new” sites. Dramatic fluctuations continue until about the 12th iteration when the homepage equilibrates at about a lowly 2.2, with other pages at about .7.

I believe that the duration of the “sandbox” is the same amount of time it takes Google to iterate through its PageRank calculations.

Therefore, I think that the “sandbox” is nothing other than the time it takes Google to iterate through the number of calculations uniquely needed to equilibrate the volume of links for a given site.

The SEO cynic will ask “but my site withstood the ‘sandbox’, so it can’t exist!’”.

Revisiting the equation, sites CAN withstand the flattening effect of the PR iteration with optimized internal link structures (that don’t bleed PR but rather conserve them) OR have an active inbound PR feed to central distributions of PR.

No comments: