Just how (un)reliable is Netcraft?
According to Netcraft statistics, Apache is loosing its market share to Microsoft’s IIS since almost the beginning of 2006:
In Mid 2006, the change was really dramatic, and since then, there were countless articles which tried to explain why Apache looses its market share, while IIS have massive gains.
I’ve seen conspiracy theories which claimed that Microsoft pays Netcraft. There were reports of massive “parked” domain blocks hosted on Linux being moved to Windows, or lower TCO of IIS when compared to Apache. Lack of GUI in Apache, superior ASP.NET, release of Windows Vista, you name it. Perhaps just IIS gains more than Apache.
Or, maybe, just Netcraft’s methodology is just plain wrong? What made me wonder, was this article:
“Here is the Netcraft survey and here is a Security Space survey. While Netcraft says Apache represents 51% market share and rapidly shrinking, Security Space puts Apache at 74% and growing! Netcraft says Microsoft IIS has 34% market share and is rapidly growing, Security Space pegs Microsoft IIS at 20% market share, as it continues to shrink“.
The methodology of Security Space is explained in their FAQ, and can be summarized as: “We visit what we consider well-known sites. In our case, we define a well-known site as a site that had a link to it from at least one other site that we consider well-known”.
I couldn’t find the methodology used by Netcraft on their website, but it seems that they add any given domain to their statistics whenever anyone visits http://uptime.netcraft.com/up/graph?site=your.domain
Security Space runs its own crawler to fetch “well-known domains”. Perhaps Netcraft also has its own crawler, but it also offers a dedicated website which allows everyone to add his/her website. While it’s great I could add my blog site to Netcraft’s monitoring, I somehow feel someone with bad intentions could easily misuse that interface with automated queries.
Wildcard DNS record, does it ring a bell for you? In short, it means that a given domain name will always resolve. For Netcraft, it will be another domain in their statistics. Try that yourself:
http://uptime.netcraft.com/up/graph?site=b-l-a-h.godaddy.com
http://uptime.netcraft.com/up/graph?site=is-it-real.godaddy.com
http://uptime.netcraft.com/up/graph?site=web10.godaddy.com
http://uptime.netcraft.com/up/graph?site=web11.godaddy.com
And, we have four new gains for IIS. Netcraft doesn’t check cookies, doesn’t require any image validation, so everyone is just welcome to write a simple script and improve the market share of Apache, IIS, Google, Sun, lighthttpd…
Was I right? Let’s see in a month.

Eimar:
I don’t know how successful they are but Netcraft claims they discard the wildcard dns domains:
http://news.netcraft.com/active-sites.html
Free online service providers, like blogging providers, will do something similar, with the variation that they usually provide their accounts hostnames under a common domain name, instead of separate domains; the domain under which these accounts occur often uses Wildcard DNS, and in some cases any hostname under the domain will return valid but computer-generated content, with the hostname taken to be as an account name and interpolated into the content. Hence it is important for the active sites methodology to discard the actual words in the page and focus instead on the page structure, as represented by the HTML tag structure.
13 September 2009, 11:59 pmadmin:
@Eimar: that’s all just theory.
Just go to a wildcard address like http://b-l-a-h.godaddy.com, or better, to netcraft’s website which is supposed to check it’s uptime:
http://uptime.netcraft.com/up/graph?site=b-l-a-h.godaddy.com
You will Netcraft first registered it on 20-Sep-2007, on the day this blog post was published. Since then, Netcraft checked the uptime for this wildcard domain several times, and they didn’t discard anything.
14 September 2009, 9:50 am