Searching in MOSS 2007
This is one of the strongest features in Office SharePoint Server! It has its own search and index engine, completely independent of the Full-Text Indexing service in SQL Server. In fact, you can activate them both. However, it will be a waste of resources, since the MOSS search and indexing feature works in any Web site, including both MOSS sites and WSS sites. A summary of the search features in MOSS are:
- Search everywhere in SharePoint — any MOSS site, any team site, and any workspace site.
- Can search almost any content source outside SharePoint — file servers, MS Exchange servers, Lotus Notes, and other Web servers, including any public Web site on the Internet.
- With MOSS Enterprise Edition you can use the Business Data Catalog feature to search in external databases and applications, such as Oracle, SAP, and Navision.
- Search all MS Office file types by default, plus all neutral file formats, such as TXT, HTML, and so on.
- Can be extended to search any file type. All that is needed is an IFilter for each file type.
- You can control which file types are to be indexed, even if there is an IFilter installed for them.
- The user profile properties will be indexed. You can search for a user with a specific property.
- You can set the schedule for full and incremental indexing. You can also force a full indexing anytime.
This indexing and search feature is activated by default for all information stored in SharePoint, both MOSS sites and WSS team sites; there is no special configuration needed to activate this. Since this feature is much more advanced than the full-text search in SQL Server, there is also a lot more configuration you can to; however, this also require more management. You, as an administrator, must understand how this feature works in MOSS and what you can do to optimize it. This is especially true when a problem arises, such as when the search results are not as expected, or when a content source isn't indexed. The following section will tell you all you need to know for your everyday work as an administrator. To extend and adjust this very important feature, you might want to read the additional coverage in Chapter 8, "Advanced Configurations," of the book, Beginning SharePoint 2007 Administration: Windows SharePoint Services 3.0 and Microsoft Office SharePoint Server 2007 (Wrox, 2007, ISBN: 978-0-470-12529-8).
There are two MOSS services engaged in this feature:
- Indexing: Responsible for crawling content sources and building index files.
- Searching: Responsible for finding all information matching the search query by searching the index files.
This is important: All searching is performed against the index files; if they don't contain what the user is looking for, there will not be a match. So, the index files are critical to the success of the search feature of MOSS. In fact, practically all configuration and management is related to the indexing service. The search functionality can be described in its simplest form as a Web page where the user defines his or her search query.
The index role can be configured to run on its own MOSS server, or run together with all the other roles, such as the Web service, Excel Services and Forms Services. It performs its indexing tasks following this general workflow:
- SharePoint stores all configuration settings for the indexing in its database.
- When activated, the index will look in SharePoint's databases to see what content sources to index, and what type of indexing to perform, such as a full or incremental indexing.
- The index service will start a program called the Gatherer, which is a program that will try to open the content that should be indexed.
- For each information type, the Gatherer will need an Index Filter, or IFilter, that knows how to read text inside this particular type of information. For example, to read a MS Word file, an IFilter for
- The Gatherer will receive a stream of Unicode characters from the IFilter. It will now use a small program called a Word Breaker; its job is to convert the stream of Unicode characters into words.
- However, some words are not interesting to store in the index, such as "the", "a", and numbers; the Gatherer will now compare each word found against a list of Noise Words. This is a text file that contains all words that will be removed from the stream of words.
- The remaining words are stored in an index file, together with a link to the source. If that word already exists, only the source will be added, so one word can point to multiple sources.
- If the source was information stored in SharePoint, or a file in the file system, the index will also store the security settings for this source. This will prevent a user from getting search results that he or she is not allowed to open.
Pretty straightforward, if you think about it. But the underlying process is a bit more complex. Fortunately you do not need to dive into these details, unless you have a very good reason to. By default, MOSS will create a single index file. This index file is not stored on the SQL server, as the other information stored in SharePoint is; instead, it is stored in the file system on the server configured to run the Index role in the SharePoint farm. This index file is stored in separate folders in the following location (assuming that you have used the default installation folder):
<Drive:>\Program Files\Microsoft Office Servers\12.0\DATA\Office @@ta Server\Applications\<Application GUID>
Application GUID is a unique hexadecimal string that identifies a specific SSP instance, such as
ae0cd4fe-ed29-418f-aa0f-eecfd7956b4f. If you have more than one SSP instance created on the same server, you can check the following registry key to see exactly what portal this
Application GUID is pointing to:
HKEY_Local_Machine/Software/Microsoft/Office @@ta Server/12.0/Search/Applications/<GUID>/CatalogNames
DisplayName will tell you what SSP instance this is. The number of files and folders stored in each index folder may surprise you, but indexing is a complex process and it shows here. You do not need to configure these files, since everything is managed by SharePoint's administration pages.
The Gatherer process keeps a log of all its activities. These log files are also stored in this folder structure, but the easiest way to view these log entries is to use SharePoint's administrative Web pages.
For more information on MOSS 2007 search configuration and advanced settings including crawl settings, scopes, authoritative pages, errors and warnings, forcing updates, managing indexing schedules, controlling what files to index, adding new file types with other IFilters, see Chapter 8, "Advanced Configurations," of the book, Beginning SharePoint 2007 Administration: Windows SharePoint Services 3.0 and Microsoft Office SharePoint Server 2007 (Wrox, 2007, ISBN: 978-0-470-12529-8).
This article is excerpted from Chapter 8, "Advanced Configurations," of the book Beginning SharePoint 2007 Administration: Windows SharePoint Services 3.0 and Microsoft Office SharePoint Server 2007 (Wrox, 2007, ISBN: 978-0-470-12529-8), by Göran Husman. In 1993 Göran became one of the first certified MS Certified Trainers (MCT) in Sweden, and he has regularly conducted MS courses ever since. He is also certified by MS as an MCP (with the number 2888) and an MSCE. His great engagement in e-mail systems awarded him status as Sweden's first MS Exchange MVP (Most Valuable Professional) by Microsoft. He switched focus to MS SharePoint in 2003, and in January 2006 Microsoft awarded him status as Sweden's first SharePoint Portal Server MVP, which was renewed in January 2007. He is also frequently a speaker in conferences and seminars. Today Göran divides his time between consulting contracts, training, leading his company Human Data, and from time to time, writing books. Oh, and he is also the proud father of six great kids from the ages of 6 to 28, which may be his greatest achievement in life.