Tuesday, November 3, 2009

SharePoint Search : Some files are not crawled (indexed)!

Recently, I was asked to find out why some documents are not indexed in a particular library and therefore, not showed in the search results. I began by inspecting the crawler log to see if there are any errors or warnings. No errors nor warnings I found. Better, I found that some of the documents are crawled correctly.
Comparing the crawled documents with the non-crawled ones, it appeared that the non-crawled ones are in minor versions (draft). By default, the crawler account is granted 'Full Read' permission. Which mean that it just cannot see draft documents which are visible only to authors who have 'Edit' permission.

So what is the solution? You have to :

- Either grant the crawler account the 'Edit' permission to let him see unpublished files and crawl them. In this case, all draft documents will show in search results to everyone, even to visitors who are not supposed to see them. The search results are not security trimmed (1). However, if you do not have access to a document, you still be denied the access even if it shows in the search result.
- Or keep the crawler account with 'Full Read' and publish the draft documents into major versions.
- Otherwise, accept to not index draft documents

I cannot recommend a solution or another. Every company must have a documents management policy, and its according to this policy that we can decide if we have to raise the right of the crawler account or keep draft documents out of the search scope.

Here are some interesting links to better understand SharePoint search behaviour :

What Does the Crawler Crawl and When?
SharePoint indexing/search behavior on major and minor versions
MOSS Enterprise Search - 16 things you might not know

Hope this helps.



(1) The search results are not trimmed only for draft items. That's what I noticed. For the other items, the results shown are trimmed at query time according to the permissions the user has. 

2 comments:

  1. What's your experience with this on SP2010?
    Thanks

    ReplyDelete
  2. I have not used search in SharePoint 2010 yet but I know that there is a new section added to the list or document library settings. It is called ‘Draft items security’ and where you can decide whether the items can be viewed by :

    -all users who have read rights
    -users who have edit rights
    -only by approvers

    Therefore, I assume that if the account service which performs the content crawling is configured by default i.e. read only rights, only draft items set to be viewed by all users who have read right will be indexed.

    I will be exploring SharePoint 2010 search in the next few weeks. I will post anything I will find relevant.

    Regards.

    ReplyDelete