Aidan Garnish

Collaboration Not Competition

Indexing and Searching PDFs in MOSS 2007

Credit for the following guide goes to MSD2D

A Guide to Indexing and Searching PDFs with SharePoint

SharePoint document libraries are phenomenal tools for collaborative environments where files are shared. And SharePoint's ability to search files in document libraries makes finding files easy. Well, unless the document is a non-Microsoft file type, such as the ever-present PDF file.

The sad fact of the matter is that Windows SharePoint Services (WSS) 3.0 and Microsoft Office SharePoint Server (MOSS) 2007 can't index PDFs by default. That's not news to many veteran SharePoint professionals. Nor is the fact that you can add an icon for PDFs, reindex existing documents, and so forth. However, many administrators are new to SharePoint, and will hit their heads hard against this problem. I was disappointed to see that, despite extensive searching on Google, I could find no single, authoritative, and (most importantly) complete guide for how to do so.

The "bottom line" is that you must install an iFilter for PDFs on your SharePoint servers--specifically, any server that performs search, which would be all WSS servers and your MOSS search server. iFilters are plug-ins that enable indexing of file types. Although iFilter is a Microsoft specification, it is generally through vendors or third parties that you'll get iFilters--not through Microsoft itself.

After you add the iFilter, you must configure SharePoint to index the file type (.PDF). But then, you still have two problems. The biggest is that SharePoint will index only files that are added or existing files whose properties change. So SharePoint will not index existing PDFs when you add the PDF iFilter. You must rebuild your index. The second challenge, purely a cosmetic one, is that you enable SharePoint to display an appropriate icon for PDFs.

This installment will focus on 32-bit WSS servers. Next time we'll look at MOSS and 64-bit servers.

Figure 1 shows the baseline--a document library with a Word document and a PDF. Note the PDF doesn't display an icon.

Figure 1
Figure 1: A PDF in a document library with no icon

Both of these documents contain the word "iFilter" in them, but a search produces only the Word document, as Figure 2 shows.

Figure 2
Figure 2: Search results do not return the PDF

Now, let's fix the problem!

1. You will need two downloads:

2. Install the iFilter. Note: Many guides on the Internet suggest shutting down Microsoft IIS or the Shared Service Provider (SSP) or the WSS application(s). I found this was not necessary, and Microsoft's own KB article 927675 did not specify it was necessary.

3. Add a registry entry for the .pdf extension in the key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\<GUID>\Gather\Search\Extensions\ExtensionList. (Open the registry editor. Navigate to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\<GUID>\Gather\Search\Extensions\ExtensionList\. Identify the highest "number" value in the key. On a default installation of WSS, the highest entry is 37. Note they are not sorted in numeric order because registry value names are strings. Create a registry value for the next number, e.g. 38, by choosing Edit à New à String Value then naming the value the next highest number (e.g. 38). Double-click the value you just created and, in the Value Data box, type: pdf. Note there is no dot preceding the extension.

4. There are two registry keys with specific values that must exist. Verify that these exist and, if not, create them:

  • HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf

- Value Name: Default; Type: REG_MULTI_SZ; Data: {4C904448-74A9-11D0-AF6E-00C04FD8DC02})

  • HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\Filters\.pdf (Value Name: Default; Type: REG_SZ; Data: (value not set)

- Value Name: Extension; Type: REG_SZ; Data: pdf

- Value Name: FileTypeBucket; Type: REG_DWORD; Data: 0x00000001 (1)

- Value Name: MimeTypes; Type: REG_SZ; Data: application/pdf

5. Restart the Windows SharePoint Services Search service. Open a command prompt. Type net stop spsearch, then net start spsearch.

Perform a search, and existing PDFs will not be returned. But newly added PDFs will (once indexed by SharePoint) appear in search results. If you modify any property of an existing PDF, it will be indexed. But who wants to modify all existing PDFs in a document library? This is where I found a lot of misinformation online. Even Microsoft's KB 927675 didn't suggest the right solution! It's easy! STSADM, SharePoint's ubercommand, to the rescue!

6. Rebuild the WSS search index.

- Open a command prompt.
- Navigate to Program Files\Common Files\Microsoft Shared\web server extensions\12\BIN and type the following commands

stsadm.exe -o spsearch -action fullcrawlstop

stsadm.exe -o spsearch -action fullcrawlstart

The existing PDFs will, after being indexed, appear in search results. But they will still not have correct icons. So, while your site is being indexed, keep going with these steps to configure the icon.

7. Open the folder Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\Template\Images.

8. Copy the gif you downloaded in Step 1 into the folder.

9. Open the folder Program Files\Common Files\Microsoft Shared\Web server extensions\12\Template\Xml.

10. Right-click the file docicon.xml and choose Open With and select Notepad.

11. In the <ByExtension> element, you'll see a number of <Mapping Key> elements. You will add one for pdf. It does not have to be in alphabetical order. The element you need to add is:

<Mapping Key="pdf" Value="pdficon_small.gif" OpenControl=""/>

12. Save that file and close Notepad.

Now, the moment of truth. A search now provides the results shown in Figure 3.

Figure 3
Figure 3: Search results showing PDFs and icons

Issue with Virtual PC and Laptop Keyboard

Having just installed Virtual PC on my laptop I was experiencing an issue with the keyboard where pressing some keys resulted in the wrong character being displayed. Eg. pressing the 'P' key meant I got a '4' displayed on screen.

The fix for this was to press 'Num Lock'.

Turn Off Custom Errors in MOSS 2007

To get the ASP.NET error message along with the call stack/stack trace do the following:

1. Navigate to the site directory.

2. Open web.config.

3. Switch Custom Errors off. Search for “customErrors” and set the value to “Off” instead of “On”.

4. Enable CallStack. Search for “CallStack” and set the value to “true” instead of “false”.

5. Save web.config.

A comprehensive guide to SharePoint debugging.

MOSS 2007 site definitions vs site templates

There are two confusing terms in MOSS 2007, site templates and site definitions. A site template is a .stp file which contains only the difference of changes from the original site definition. A user who wants to install a custom .stp file must have a site definition installed from which the .stp file was saved.

A site definition on the other hand is a complete definition with a directory structure containing .aspx files and a Onet.xml file.

For more on creating site definitions see Madhur Ahuja's post

Using SharePoint web services to get list items

A colleague has just been wrestling with the syntax to get the list items out of the XML returned by the SharePoint web services. These two blog postings helped hugely:

UPDATE - I have posted a code snippet for retrieving items from a list using web services here

Custom SiteMapPath to handle variations in MOSS 2007

Whilst developing a MOSS 2007 Internet site I came across an issue with the SiteMapPath control when using variations. Variations use the root site to direct the user to the correct variation for their settings. A user with settings of en-GB, for example, will be directed towards the relevant variation by variationroot.aspx.

This is great except when using the SiteMapPath breadcrumb navigation control which displays the variation root site as well as the relevant variation site by default.

By default the following is displayed:

Variation Root > English Site > Some Page

What I want to display is:

English Site > Some Page

To achieve this I have created a server control that inherits from SiteMapPath and has an additional property called IgnoreNode. I have overridden the RenderContents event so that the HyperLink that has the same text as IgnoreNode is no longer rendered. In my case I set IgnoreNode to "Variation Root" so that it is not displayed in the breadcumb but you could choose any node that you don't want to display.

The code:

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Text;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;

namespace CustomBreadcrumb
    [ToolboxData("<{0}:CustomBreadcrumb runat=server></{0}:CustomBreadcrumb>")]
    public class CustomBreadcrumb : SiteMapPath
        public string IgnoreNode
                object o = ViewState["IgnoreNode"];
                if (o != null)
                    return (string)o;
                    return String.Empty;

                ViewState["IgnoreNode"] = value;
       protected override void RenderContents(HtmlTextWriter output)
          bool blnNodeWasIgnored = false;
          HyperLink hlLink;

          foreach (Control oControl in this.Controls)
              SiteMapNodeItem oSiteMapNodeItem = (SiteMapNodeItem)oControl;

              foreach (Control aControl in oSiteMapNodeItem.Controls)
                  if(aControl.GetType().ToString() == "System.Web.UI.WebControls.HyperLink")
                      hlLink = (HyperLink)aControl;

                      if (hlLink.Text == IgnoreNode)
                         //don't call render
                          blnNodeWasIgnored = true;
                          blnNodeWasIgnored = false;

                  if (aControl.GetType().ToString() == "System.Web.UI.WebControls.Literal")
                      if (blnNodeWasIgnored == true)
                          //don't render the seperator

To deploy this control it needs to be strongly named and installed to the GAC using gacutil.exe

An entry in the web.config of your web application needs to be added:

<SafeControl Assembly="CustomBreadcrumb, Version=, Culture=neutral, PublicKeyToken=91e52569228df0da" Namespace="CustomBreadcrumb" TypeName="*" Safe="True" />

(Use reflector to find out the public key token of your dll.)

Restart IIS or even better recycle the relevant application pool.

You will need to register the control by adding the following line in the master page:

<%@ Register TagPrefix="adg" Namespace="CustomBreadcrumb" Assembly="CustomBreadcrumb, Version=, Culture=neutral, PublicKeyToken=91e52569228df0da" %>

Finally add the control to your master page, publish and approve the page.