How to: Exclude part of a page from being indexed by SharePoint Search
Posted
Wednesday, January 26, 2011 12:07 PM
by
CoreyRoth
This is something that I have been trying to figure out for quite some time. I’ve seen numerous people ask in the forums and no one seemed to have a conclusive answer (at least the last time I checked). The issue is simple. You want to index a regular non-SharePoint web site. Usually, it’s your company’s public-facing web site. That site has common navigation on every page with terms such as Contact, Locations, Privacy Policy, About, etc. that you don’t want to be indexed. If it is indexed, every time a user types in contact, they end up having every page on the site returned in the search results. When I was at FAST University, I had a chance to ask Leonardo Souza about this issue and he told me the secret. Let’s take a look at my example site so you can see what I mean.
Here’s the home page. As you can see, I spent hours working on the branding for it. :) The Contact Us and Privacy Policy links are considered the navigation and are repeated on each page.
The Contact Us page looks similar with the same navigation.
Lastly, the Privacy Policy page has the same navigation as well.
We want to exclude the contact us and privacy policy links in the navigation from our search results. How do we do that? It’s pretty simple actually. Just put the content that you do not want indexed in a div tag with a class of noindex. Let’s look at the complete HTML of the home page.
<html>
<head>
<title>Super Neat Home Page</title>
</head>
<body>
<div>
Welcome to our awesome site. We are the best! <a href="test.html">Awesome Stuff</a>
If you need to get a hold of us, click <a href="contactus.html">here</a>. Worried,
we'll <a href="privacy.html">sell you out?</a>
</div>
<div class="noindex">
<a href="contactus.html">Contact Us</a> <a href="privacy.html">Privacy Policy</a>
</div>
</body>
</html>
You can see that the Contact Us and Privacy Policy links are inside <div class=”noindex”>. You might have noticed that the body of the page also has links to these two pages. I had to include these so that those pages would get indexed. Since the common navigation is excluded there was no way for the crawler to follow those links. This is something you will want toconsider when you are designing master pages because you will need to have at least one link to each page on the site somewhere.
Since I learned about this in the context of FAST Search for SharePoint, I decided to look at it first. The first thing I will do is show you the results of the entire content source. That way you will believe me that all of the pages are in the index. :) I do this with the ContentSource keyword as I have mentioned in my handy keywords post.
The search results shows the four pages from site. Now let’s verify that the noindex class worked. Searching for the word contact yields a single result.
Searching for privacy policy also yields a single result.
The noindex class works great with FAST Search for SharePoint. At this point though, I wondered would this also work with Enterprise Search in SharePoint 2010? I decided to give it a try and sure enough it works there too.
Will this also work in SharePoint 2007? I haven’t had time to try it yet. If you have tried it before, please leave a comment and let us know. Maybe you already knew about this technique, but I think there are plenty of people who don’t so I hope this post helps. I highly recommend making use of the noindex attribute any time you want to index a non-SharePoint site, such as your public-facing company web site. By excluding redundant sections of the page, you make your search results much more usable.