14th Colony

Wed 19 November 2008

14thC: Membership has it's privileges - Register Free


Subscribe via email!

Get the latest 14th Colony content delivered to your inbox. Just enter your email address:

 Subscribe in a reader


Subscribe with Bloglines Add to Technorati Favorites



Search 14th Colony


Advertise on 14th Colony!


Spread your message!


Google Sandbox

Randall McCarley
by Randall McCarley
October 15th, 2007

Every web site owner committed to their search marketing must be aware of Google’s sandbox. It’s "that thing" that prevents high SERP placement in Google’s organic results for new web sites. Once applied it takes anywhere from 8-14 months to get out of. And it’s difficult to understand how it works because it only seems to effect competitive terms and it doesn’t appear to treat all sites equally.

Competitive terms are more susceptible to the sandbox effect and the more competitive a term is, the longer a site will stay in the sandbox.

The sandbox creates frustration and adds expense to web site owners during their most critical period of development: startup.

Because Google is the biggest search engine with over 50% of the search market web site owners are often desperate to get top placement, but this has seemed impossible to achieve with new sites.

Methods of detecting the sandbox effect include high ranks in Yahoo and MSN while doing poorly in Google. Since all three SEs are independent this isn’t conclusive by itself. But if you doing well in the other search engines and you rank well with the allin: commands at Google you are quite likely sandboxed.

I want to be clear about the intent of this article. It isn’t about how to escape the sandbox. I get that once you’re in you have to wait for it to naturally resolve itself. This article should help you avoid the sandbox.

There is still a lot of testing to do.

And since there is a lot of testing to do I’m sure I’ll hear a lot of comments that my theory is BS. I’m totally cool with that provided you are willing to discuss your opinions and observations.

Google Loves Backlinks

Google got its start with Backrub - the precursor to PageRank. While PR doesn’t directly affect the SERPs it does show Google’s tradition and love of backlinks. And everyone knows Google uses backlinks to determine SERP placement (a search for [miserable failure] will show an example of links outweighing content), though in a different fashion. Google adds and removes link power based on strength of the originating site, authority of the originating site, relevance, placement of the links, context of the link within the site, surrounding text, anchor text, and even reduces link power based on several other factors!

One thing Matt Cutts, a Google Engineer, has been pushing over the last few months is that Google knows if links are purchased or not. Some SEOs don’t believe this is possible.

Google employees are engineers that think creating algorithms is fun. Of course Google can determine if a link is purchased, reciprocated, or 3-way! People, including SEOs, are creatures of habit. As SEOs we tend to replicate what works including link strategies and partners. As we replicate these "winning strategies" we create patterns Google can detect.

Almost everything about Google’s search engine results pages come from link patterns!

Trust is Everything

"Sites that sell links can lose their trust in search engines." Matt Cutts

Cutts keeps bringing up the word "trust" in his blog. It’s interesting when you compare the name given to this oddball effect with the definition of "sandbox" in computer security: a sandbox is a safe place for running semi-trusted programs or scripts, often originating from a third party.

Sites that are sandboxed are included in Google’s index. They just don’t appear as high as they "should" in the SERPs.

Google isn’t anti-content, or anti-new-website. Google’s mission is to organize the world’s information and make it universally accessible and useful.

Google needs to know it can trust your site before it will list it. To do otherwise jeopardizes their credibility.

Google trademarked the term TrustRank in March of 2005 about the same time they hired Zolt´an Gyöngyi, Hector Garcia-Molina and Jan Pedersen, authors of the original TrustRank whitepaper from Stanford. Google also filed for a patent on the TrustRank concept.

TR works by taking some "seed" sites and labeling them as "trusted". By tracking links, Google can place your site as close to one of these sites or close to a "bad neighborhood" (a site that is penalized or banned). If your site is closer in relationship with bad neighborhoods than it is to trusted sites, you take a hit. This doesn’t mean you’ve been penalized, per say, but this is very likely the most common cause of the sandbox effect being triggered - especially in the SEO community because of the common methods used to get backlinks (more on this in a minute!).

Google Categorizes Search and Sites

I had a lengthy discussion with a Russian developer (Geo2005) about the sandbox. He claimed it was an issue of categorizing not just web sites, but also search phrases. He also claimed that Russian sites (and other non-English sites) don’t get sandboxed!

This blew my mind.

Geo2005’s point was that the sandbox is not age-based, but category-based. When you take into account how Google sandboxes competitive terms longer it starts to make sense. It also shows how new sites can get an early jump and then fall into the sandbox.

Site "A" is launched with just a few pages. It gets backlinks through networking (i.e. people familiar with the site owner). And it gets an instant boost in Google. Then the site adds new content and starts getting more backlinks and is suddenly sandboxed. How did it happen?

At first the site was small and on-topic making it easy for Google to place it in the SERPs. When the site branched out, this confused Google. On top of that, the first round of backlinks came from friends of the site owner in related fields encouraging Google that this site was placed correctly in the SERPs.

But the second round of backlinks probably came from directories and other nonspecific and/or unrelated sites, confusing things. At this point the site does not have enough backlinks or content to make it clear where it belongs plus some of the links it got on round 2 are within 3 clicks of bad neighborhoods making the TR score drop. At this point, Google has no choice but to sandbox the domain.

Site "B" is a foreign-language site. We’ll say it’s Russian since I know that language works. The site starts with 200 pages. It ranks well for terms that are very competitive in English. It gets tons of backlinks from all over the Russian community, both related and not related, and yet avoids the sandbox.

The first filter applied to it when sorting is the language filter. Well, the broad category is "Russian" and the sub-category is whatever the site is actually about. Google can clearly see where to place the site in the SERPs because it has a niche within the broad category and foreign language sites aren’t nearly as competitive!

The links are related because they fit the broad category (language) regardless of what the sites are actually about.

This doesn’t mean foreign language sites are exempt of the sandbox, just that it’s harder for them to fall into the trap because that whole field is less competitive meaning broad topics haven’t been broken into as many niches. This also stems from Google’s limitations to understand foreign languages. While Google expands into these areas they have been asking for help.

Google has Devalued Nonspecific Backlinks

Comment spam, paid for links, reciprocal links, site-wide links, and most directories have been devalued. Cutts has mentioned that directories that specialize and have some sort of editing standards still apply. Sites like DMOZ and the Yahoo! directory. I know from experience thomasnet.com also works. Most quality directories have strict standards of acceptance. And Google likes that.

I think this is what explains the "age" process in Google as well. As Google devalues backlinks from these sources, your site’s actual intent becomes clear. With these backlinks devalued your TR score increases as well.

The sandbox is something you trip and then it takes time for Google to pull you back up as they "undo" your backlinks. And they do this by dropping the ones that aren’t relevant or are too broad in application – most notably directories and paid-for links!

Content is Still King

The reason competitive terms have longer sandbox periods is because competitive terms become specialized.

Think of it this way. [SEO] is a pretty common term for search marketers to target and it has become extremely competitive. Then some clever SEOs realized that they should target two-word phrases like [SEO Success] and [best SEO] and all the two-word SEO phrases became competitive. Next were three-word phrases, geographic phrases, and then specialties.

To get top ranking for the general term [SEO] you would need to be an authority. But that doesn’t happen overnight and it requires a ton of relevant backlinks to prove others think you have a site that is a resource. Not only that, you need the content to back it up (think: completeness of information). And Google has to know what your site is about in the first place!

Many web developers move too fast for Google. They start with a smaller site that is dedicated to [SEO tactics for web hosts] and then try to get to the more general (and competitive) phrase too soon. While Google is digesting the first round of content and backlinks the developer is adding more content. Google gets confused.

Wait! I thought this site was under SEO/tactics/hosting but now there is content that would put it under SEO/tactics/discussion AND under SEO/tactics/backlinks. But this site doesn’t have enough information to fit under the broad (competitive) category of just [SEO tactics]. And all the backlinks say it’s for specific topics "hosting", "discussion" and "backlinks".

Keep in mind Google takes your whole web site into account when determining SERP placement – not just a single page. With this holistic approach, consider what Google must be "thinking" as it initially crawls your site. And that is also why Cutts has mentioned starting with a niche site is a great way to go until you are established. Then you can branch out.

Tips to avoid the sandbox:

  • Build for a tight niche within a broad category
  • Only get relevant backlinks
  • Avoid being associated with bad neighborhoods - even 2 or 3 clicks away!
  • Only put your site in relevant directories that have standards for inclusion
  • Keep your site on-topic until it is firmly established
  • Go slow when expanding into new content areas

Things that will trip the sandbox:

  • Jumping off-topic. This is difficult to determine sometimes because "off topic" isn’t just out of your industry but is within your broad category. A site dedicated to [yellow widgets] that suddenly posts content about [blue widgets] may take a hit.
  • Irrelevant BLs. These confuse Google. Google *thinks* it knows where the site should appear (hence the placement with the allin: commands) but isn’t sure.
  • General BLs. These don’t do anything for your rank anyway, may confuse Google, and are often within 3 clicks of a bad neighborhood.
  • Linking too close to untrusted sites.

Final Thoughts

There is a certain irony about how the sandbox works. SEOs chasing PR are likely digging themselves in deeper as they get many more irrelevant backlinks to achieve a high PR.

TrustRank and the Category filter are two different things with the same result. And the sandbox isn’t so much a filter as an effect.

Age has nothing to do with the sandbox, but time does once you’re in it.

Another way to think of the sandbox is a "confidence test". Google needs to be certain they are making the right decision when placing your site high in the SERPs. For this they need clarity about your site content. That clarity comes from a few onpage factors but mainly by the links pointing to it.

Advertising on your site may raise a flag for a closer look. Use the "nofollow" attribute on any purchased links or affiliate links you have on your site.

The biggest flaw in this is that I am taking other web developers and Matt Cutts at their word. Testing will show things one way or the other.