Q: Hello! I am a consultant in web analytics, and I’m looking for a tool that simply counts how many pages the website has, so I can calculate the inclusion ratio, or the percentage of pages indexed in the search engine. It seems like such a basic thing, yet I’ve been literally searching for hours and can’t find what I’m looking for. Can you help?
A: What a great question. Although we don’t know of a tool that is designed to do specifically what you’re describing, we suggest you try an XML Sitemap generator. These tools are designed to create, basically, a list of all the pages on a site, so you could get the total number as a fringe benefit.
For a small site under 500 pages, you could use the online version at http://www.xml-sitemaps.com/ . For larger sites, you can pay for the standalone software available on that site, or review other sitemap generators at: http://code.google.com/sm_thirdparty.html
Another option would be an application such as SiteCrawler, designed to download and crawl an entire website.
Be warned: any problems that Google is encountering while trying to index your website (such as pages accessible only through javascript form submittals) may also be encountered by these other software programs. Getting an accurate number using any of these tools will be almost impossible. You might be better off using a trend as your metric – rather than total percentage indexed, you can report on the (+) or (-) change in number of pages indexed over time.


{ 7 comments… read them below or add one }
I did not understand how to instal this tools – SiteCrawler. Can you help me ? Thank you !
instructions for downloading & installing the software can be found on the creator’s website, here: http://www.lightheadsw.com/sitecrawler/
(note that we have no affiliation with this software).
Another spidering tool that is available as of earlier in 2009 is the Microsoft IIS SEO Toolkit. Read more about it here: http://weblogs.asp.net/scottgu/archive/2009/06/03/iis-search-engine-optimization-toolkit.aspx. Although this tool must be run on Microsoft Windows Server, your website does not need to be hosted on Windows for it to be spidered.
Many times i have SEO questions in my head, i found your blog on Google and nowhere else. And your articles have never disappointed me once like this one also. However, could you please explain a little more about “using a trend as your metric” Why is it better? And how do i gather the trend info. exactly? Please. (don’t worry i’m not your competitor, my SEO firm is in Thailand) Thanksss
Hi Joe,
When an exact value is not available for a metric, you can watch how it’s trending, rather than the absolute value. In this example, you could monitor whether the number of pages on your website showing up in the search engines is increasing, decreasing, or flat. Watching this trend might allow you to catch indexing problems early, or understand when efforts you’re making are paying off!
I use Xenu Link Sleuth for small sites, I copy all results in excel file and sort/delete. Its a bit lengthy way but it works for me for small sites
i want to know,how to count static pages of a website?Thanks in advance.
Hi there! Thanks for this Q/A forum, this was a particular interesting one. I was asking myself this same question when I realized that Google and Yahoo! indexes where substantially different. To understand how bad was the situation I needed to now the number of total pages of the site to extract an indexation rate and start working on optimization. The new problem is, what am I going to do identify the pages that are not indexed?