Q: Hello! I am a consultant in web analytics, and I’m looking for a tool that simply counts how many pages the website has, so I can calculate the inclusion ratio, or the percentage of pages indexed in the search engine. It seems like such a basic thing, yet I’ve been literally searching for hours and can’t find what I’m looking for. Can you help?
A: What a great question. Although we don’t know of a tool that is designed to do specifically what you’re describing, we suggest you try an XML Sitemap generator. These tools are designed to create, basically, a list of all the pages on a site, so you could get the total number as a fringe benefit.
For a small site under 500 pages, you could use the online version at http://www.xml-sitemaps.com/ . For larger sites, you can pay for the standalone software available on that site, or review other sitemap generators at: http://code.google.com/sm_thirdparty.html
Another option would be an application such as SiteCrawler, designed to download and crawl an entire website.
Be warned: any problems that Google is encountering while trying to index your website (such as pages accessible only through javascript form submittals) may also be encountered by these other software programs. Getting an accurate number using any of these tools will be almost impossible. You might be better off using a trend as your metric – rather than total percentage indexed, you can report on the (+) or (-) change in number of pages indexed over time.


{ 18 comments… read them below or add one }
I did not understand how to instal this tools – SiteCrawler. Can you help me ? Thank you !
instructions for downloading & installing the software can be found on the creator’s website, here: http://www.lightheadsw.com/sitecrawler/
(note that we have no affiliation with this software).
Another spidering tool that is available as of earlier in 2009 is the Microsoft IIS SEO Toolkit. Read more about it here: http://weblogs.asp.net/scottgu/archive/2009/06/03/iis-search-engine-optimization-toolkit.aspx. Although this tool must be run on Microsoft Windows Server, your website does not need to be hosted on Windows for it to be spidered.
Many times i have SEO questions in my head, i found your blog on Google and nowhere else. And your articles have never disappointed me once like this one also. However, could you please explain a little more about “using a trend as your metric” Why is it better? And how do i gather the trend info. exactly? Please. (don’t worry i’m not your competitor, my SEO firm is in Thailand) Thanksss
Hi Joe,
When an exact value is not available for a metric, you can watch how it’s trending, rather than the absolute value. In this example, you could monitor whether the number of pages on your website showing up in the search engines is increasing, decreasing, or flat. Watching this trend might allow you to catch indexing problems early, or understand when efforts you’re making are paying off!
I use Xenu Link Sleuth for small sites, I copy all results in excel file and sort/delete. Its a bit lengthy way but it works for me for small sites
i want to know,how to count static pages of a website?Thanks in advance.
Hi there! Thanks for this Q/A forum, this was a particular interesting one. I was asking myself this same question when I realized that Google and Yahoo! indexes where substantially different. To understand how bad was the situation I needed to now the number of total pages of the site to extract an indexation rate and start working on optimization. The new problem is, what am I going to do identify the pages that are not indexed?
I am great fan of xml-sitemaps and have been using for my site http://www.togotutor.com for almost a year now. The only problem I am facing here is with the sitemap for vbulletin forum, which is really complicated to generate and the calender.php takes all the primary url`s during the crawl. So it`s useless If you are not able to aford that seo tool which costs almost 100 bucks.
There is another free tool Gsitecrawler with which you can count the no. of live pages in a website.
Thank you for the crawler info. Google is particularly picky when it comes to site maps. Thanks.
Why can’t you give site: yoursitename.com in google search. Go till the last page and this give the no pof pages that google covered or approx the no of pages
Use the online sitemap generator and it will count your pages and broken links also. You can than fix the broken links easily than.
Yes, The XML Sitemap has a very good tool for counting the web pages, Thanks for informative sharing to all.
The easiest way to check in google is to type
site:www.( domain of your site).com
example
site:www.msn.com
The results google returns are all the pages indexed by google under that domain. Take a look at the url of the displayed page and you will see they all start with www.( domain of your site).com…..etc
hi Trev,
Using the site: search on Google is a common way that people count the number of pages indexed on their site. However, the person asking the question wanted to know how many total pages on their site exist so he could *compare* that number with Google’s indexed number. So, telling him to use Google’s indexed number wouldn’t have been a very helpful answer
From past 3years, I am using http://www.xml-sitemaps.com to find number of pages in a website..
hi
I want to find the number of pages in a website and I went to this site http://www.xml-sitemaps.com but I could not find what I need. please help me
thanks
One little trick I use to quickly count the number of pages from a competitor website. I go to their sitemap usually http://www.competitor.com/sitemap.xml then I use the search tool of my browser (Ctrl+F) and search for . It gives me the count of occurences (Chrome does at least).
Of course, the sitemap has to be up to date.