Over the last few days I have been doing some research on the best way to implement search functionality for a site I am currently building. The site will consists mainly of news articles. The client wanted a search that would allow a user to search across all fields that related to a news article.
Originally, I envisaged writing my own SQL to query a few tables within my database to return some search results. But as I delved further into designing the database architecture in the early planning stages, I found that my original (somewhat closed minded) approach wouldn't be flexible nor scalable enough to search and extract all the information I required.
From what I have researched, the general consensus is to either use SQL Full Text Search or Lucene.NET. Many have favoured the use of Lucene due to its richer querying language and generally more flexible since you have the ability to write a search index tailored to your project. From what I gather, Lucene can work with any type of text data. For example, you not only can index rows in your database but there are also solutions to support indexing physical files in your application. Neat!
I have written some basic code (below) with a couple methods to get started in creating a search index and carrying out a multi-query search across your whole index. You would further enhance this code to only carry out a full index once all required records have been added. Most implementations of Lucene would use incremental indexing, where documents already in the index are just updated individually, rather than deleting the whole index and building a new one every time. I plan to hook up and optimise my Lucene code into a service that would be scheduled to carry out an incremental index every midnight.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Lucene;
using Lucene.Net;
using Lucene.Net.Store;
using Lucene.Net.Analysis;
using Lucene.Net.Analysis.Standard;
using Lucene.Net.Index;
using Lucene.Net.Documents;
using Lucene.Net.QueryParsers;
using Lucene.Net.Search;
using System.Configuration;
namespace MES.DataManager.Search
{
public class Lucene
{
public static void IndexSite()
{
//The file location of the index
string indexLocation = @ConfigurationManager.AppSettings["SearchIndexPath"];
Directory searchDirectory = null;
if (System.IO.Directory.Exists(indexLocation))
searchDirectory = FSDirectory.GetDirectory(indexLocation, false);
else
searchDirectory = FSDirectory.GetDirectory(indexLocation, true);
//Create an analyzer to process the text
Analyzer searchAnalyser = new StandardAnalyzer();
//Create the index writer with the directory and analyzer.
IndexWriter indexWriter = new IndexWriter(searchDirectory, searchAnalyser, true);
//Iterate through Article table and populate the index
foreach (Article a in ArticleBLL.GetArticleDetails())
{
Document doc = new Document();
doc.Add(new Field("id", a.ID.ToString(), Field.Store.YES, Field.Index.UN_TOKENIZED, Field.TermVector.YES));
doc.Add(new Field("title", a.Title, Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES));
doc.Add(new Field("articletype", a.Type.TypeName, Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES));
if (!String.IsNullOrEmpty(a.Summary))
doc.Add(new Field("summary", a.Summary, Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES));
if (!String.IsNullOrEmpty(a.ByLineShort))
doc.Add(new Field("bylineshort", a.ByLineShort, Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES));
if (!String.IsNullOrEmpty(a.ByLineLong))
doc.Add(new Field("bylinelong", a.ByLineLong, Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES));
if (!String.IsNullOrEmpty(a.BasicWords))
doc.Add(new Field("basicwords", a.BasicWords, Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES));
if (!String.IsNullOrEmpty(a.MediumWords))
doc.Add(new Field("mediumwords", a.MediumWords, Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES));
if (!String.IsNullOrEmpty(a.LongWords))
doc.Add(new Field("longwords", a.LongWords, Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES));
//Write the document to the index
indexWriter.AddDocument(doc);
}
//Optimize and close the writer
indexWriter.Optimize();
indexWriter.Close();
}
public static List<CoreArticleDetail> SearchArticles(string searchTerm)
{
Analyzer analyzer = new StandardAnalyzer();
//Search by multiple fields
MultiFieldQueryParser parser = new MultiFieldQueryParser(
new string[]
{
"title",
"summary",
"bylineshort",
"bylinelong",
"basicwords",
"mediumwords",
"longwords"
},
analyzer);
Query query = parser.Parse(searchTerm);
//Create an index searcher that will perform the search
IndexSearcher searcher = new IndexSearcher(@ConfigurationManager.AppSettings["SearchIndexPath"]);
//Execute the query
Hits hits = searcher.Search(query);
List<int> articleIDs = new List<int>();
//Iterate through index and return all article id’s
for (int i = 0; i < hits.Length(); i++)
{
Document doc = hits.Doc(i);
articleIDs.Add(int.Parse(doc.Get("id")));
}
return ArticleBLL.GetArticleSearchInformation(articleIDs);
}
}
}
As you can see, my example allows you to carry out a search across as many of your fields as you require which I am sure you will find useful. It took a lot of research to find out how to carry out a multi query search. Majority of the examples I found over the internet showed you how to search only one field.
The main advantage I can see straight away from using Lucene is that since the search data is held on disk, there is hardly any need to query the database. The only downside I can see is problems being caused by the possibility a corrupt index.
For more information on using Lucene, here are a couple of links that you may find useful to get started (I know I did):
http://www.codeproject.com/KB/library/IntroducingLucene.aspx
http://ifdefined.com/blog/post/Full-Text-Search-in-ASPNET-using-LuceneNET.aspx