How to understand full text search


Before we begin the step-by-step process of creating and using full-text indexes, there are a few fundamentals that you'll need to understand. Full-text search isn't a fancy way of doing a LIKE search with SQL. Instead, every word is placed into a special type of index called a full-text index. These indexes are organized and stored in full-text catalogs, which act as containers to organize our indexes. Each word in a full-text index also includes a unique key for that record. You should note that in order to full-text index a table, SQL Server requires the table to have a unique, single-column key. This single-column key is used as part of the ranking functions we'll cover later in this article. All of the text-based data types are eligible for full-text searching. The complete list is char, nchar, varchar, nvarchar, text, ntext, xml, image, and varbinary( max). According to online documentation from Microsoft, text, ntext, and image data types will be deprecated in future versions of SQL Server, so I suggest avoiding these if you can. Char, nchar, varchar, and nvarchar all make sense as candidates for full-text indexing. XML also makes sense, because it's text based, but adds the advantage that markup tags are ignored-only the data is full-text indexed. The data type that might have you scratching your head is varbinary(max). To understand this, we have to briefly delve into the history of the full-text engine.

The code base for the full-text search engine included with SQL Server descended from a product called Microsoft Index Server. With it, you could index various document types stored on your server, be it a Windows NT 4.0 server or IIS. The ability to look inside documents and index their content was retained and lives on in SQL Server's full-text search engine. SQL Server allows you to store various types of unstructured documents, such as Microsoft Word, Excel, and many others inside a varbinary(max) field. If the full-text engine recognizes the type of document stored in a varbinary(max) field, it'll open the document and index all words contained in the document. The "one" in our one-two-three concerns the catalog. The catalog is a logical container to hold a group of one or more full-text indexes. Creating a catalog is fairly straightforward. First, note that you'll want to supply the name of your catalog in place of Adventure- WorksFTC. If you only have one full-text catalog for your database, I suggest using the same name as the database followed by FTC (for full-text catalog), as in the example. The optional AS DEFAULT tells SQL Server to use this particular catalog as the default for all full-text commands if no catalog is specified. It's a good idea to specify at least one catalog as the default, and if you only have one, you definitely want to add this to the statement. That's all there is to it; you now have an empty catalog waiting for your indexes. Before we start loading it with full-text indexes, though, let's take a moment to look at a few commands available for maintaining the catalog. The first two are similar to each other, in that they update all of the indexes in the catalog, but they do it in quite different ways.

The first is the REBUILD command. This will go through each index and rebuild it from the source tables. It's the fastest, most efficient way to rebuild an entire catalog, but it has the side effect of taking the catalog offline-your catalog won't be available for your users to do any full-text searching. If your operation is a 9-to-5 shop and you're doing a rebuild during off hours, then REBUILD is the way to go. But what if your operation runs 24 hours a day? For those situations, we have the REORGANIZE command. The REORGANIZE command will rebuild all indexes, without taking the catalog offline. Your users will still be able to use and query the catalog normally. The downside is that this is a lot slower than doing a rebuild. Fortunately, doing either a rebuild or a reorganize to your catalog is fairly rare. The most likely call for this is during database updates that span the majority of tables in the database. When making mass updates, you may find a significant speed increase by turning off full-text indexing, doing the updates, turning indexing back on, and then rebuilding/reorganizing the entire catalog. The final command is quite simple: it sets a catalog to be the new default catalog. Like the previous commands, this isn't something you'll use often. Perhaps in a long script, you may want to change the default catalog temporarily to make your coding easier.

Legal Disclaimer

Our website is not responsible for the information contained by this article. Articleinput.com is a free articles resource thus practically any visitor can submit an article. However if you notice any copyrighted material, please contact us and we will remove the article(s) in discussion right away.

Note: This article was sent to us by: Loyd A. Swanson at 01092010

Related Articles

1. Definitions and uses of functions in Python
This article assumes you're familiar with function definitions in at least one other computer language and with the concepts that correspond to...

2. Python functions handle variable numbers of arguments
Python functions can also be defined to handle variable numbers of arguments. You can do this two different ways. One way handles the relativel...

3. Lambda expressions and generator functions in Python
Short functions like those you just saw can also be defined using lambda expressions of the form. Lambda expressions are anonymous little funct...

4. How to create a basic program in Python
Up until now, you've been using the Python interpreter mainly in interactive mode. For production use, you'll want to create Python programs or...

5. How to make Python script execution in UNIX and Mac OS X and in Windows
If you're on UNIX, you can easily make a script directly executable. Note that if Python 3.x isn't your default version of Python, you may need...

6. The difference between scripts on Windows scripts on UNIX
The way you call scripts on Windows differs from the way scripts are called on Linux/ UNIX, and that difference can affect what kind of scripts...

7. Python applications are distributed as source files
You can distribute your scripts as source files (as .py files). You can also ship them as byte code (as .pyc or .pyo files). A byte code file w...