ContentEnumerator.cs for stored procedure sample

Here is a new version of ContentEnumerator.cs and Accessor.cs based off of John Kohz's PH c# sample to show you how you might utilize a custom protocol handler to crawl a database.

Just replace the origninal file in his sample code with this version and start from there. 

This compiles but is not tested. I based it off of work I have done in the past that does work to give you an idea of some of the concepts. If you have any questions just ask and I will try to clarify or fix anything in this file.

One glaring omission for this sample is the ability to add custom properties to the index besides the basic Title and URL properties. It is not a far reach to figure out how though from I have provided.

More to come on the tricky stuff I discussed in my last post.

Snippets from the code:

public ContentEnumerator(string sUrl): base(sUrl)
{

 if (sUrl.ToLower().Contains("fake.txt?id="))
  m_type = UriType.Item;
 else if (sUrl.ToLower().Contains("fake.txt"))
  m_type = UriType.Container;
 else
  throw new ApplicationException("Unknown content type");
 
 string tURL = sUrl.Substring(sUrl.IndexOf("//") + 2); // trim off the excess
 string[] aURL = tURL.Split("/".ToCharArray());
 m_server = aURL[0];
 m_dbname = aURL[1];
 m_enumstoredProc = aURL[2];
 m_itemstoredProc = aURL[3];
 string[] parms = this.Query.Split("&".ToCharArray());
 Hashtable ht = new Hashtable();
 for (int i = 0; i < parms.Length; i++)
 {
  string[] brk = parms[i].Split("=".ToCharArray());ht[Microsoft.JScript.GlobalObject.decodeURI(brk[0]).ToLower()] = Microsoft.JScript.GlobalObject.decodeURI(brk[1]);
 }

 if (Type == UriType.Item)
 {
  m_itemID = ht["id"].ToString();
  m_itemType = ht["type"].ToString();
 }

 Logging.Trace("ContentEnumerator created");Logging.Information("Creating ContentEnumerator for: " + sUrl);
}

public void EnsureContent()
{

 //need to initialize a stream here.

 String contstr = m_dsContent.Tables[0].Rows[0]["CONTENT"].ToString(); // in this case your Stored proc combines all your searchable data into one field.

 byte[] byteArray; UnicodeEncoding uniEncoding = new UnicodeEncoding();
 // Create the data to write to the stream.

 byte[] firstString = uniEncoding.GetBytes(contstr);m_Stream = new MemoryStream(firstString);
}

public byte[] SecurityDescriptor
{

 get

 {
  if (m_SecurityDescriptor == null)
  {
   DateTime startTime = Logging.Enter(typeof(ContentEnumerator), "SecurityDescriptor-create");
   // lets say for the our example here that the stored proc will return a list of users that

   //are allowed to access the items and they are in domain\user format already :)
   String owner = m_dsContent.Tables[0].Rows[0]["OWNER"].ToString(); ArrayList al = new ArrayList();
   if (m_dsContent.Tables.Count == 2) // second one is security list
    for (int i = 0;i<m_dsContent.Tables[1].Rows.Count;i++)
     al.Add (m_dsContent.Tables[1].Rows[i]["USER"]);

   String[] users = (String[]) al.ToArray (typeof (String));
   // need to include the Win32.Security libary. Do a search for it on Google
   PHSecurityDescriptor sec = new PHSecurityDescriptor();
   sec.LoadSecurity(owner,users);
   m_SecurityDescriptor = sec.GetBuffer();
  }

  return m_SecurityDescriptor;
 }
}

See the file link at the top of this post for the rest

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks
Published Tuesday, June 19, 2007 4:42 PM by notorioustech
Filed under: , ,

Comments

Monday, July 02, 2007 10:47 AM by notorioustech

# re: ContentEnumerator.cs for stored procedure sample

Oops. Sorry I had uploaded the .cs file thinking the server would let you download it. Fixed. The contentenumerator.cs can be downloaded now

# Links (7/16/2007) &laquo; Steve Pietrek&#8217;s SharePoint Stuff

Pingback from  Links (7/16/2007) &laquo; Steve Pietrek&#8217;s SharePoint Stuff

Friday, July 27, 2007 3:27 AM by Shanti

# re: ContentEnumerator.cs for stored procedure sample

Hi Christ,

 I am trying to implement Proctol Handler. Your code helped me how to move on John's code.

while I am trying to set the m_ItemEnumerator as

this.m_ItemEnumerator = ((System.Data.DataTable)m_ContainerEnumerator.Current).Rows.GetEnumerator();

Its returning null. Can you help me why this error is coming ? and also how the execution of the code is done while the protocol Hanlder is running.

It will be a great help.

Thanks in Advance

Shanti

Wednesday, August 08, 2007 6:24 PM by notorioustech

# re: ContentEnumerator.cs for stored procedure sample

Sorry for the delay in this response. I forgot I had moderation turned on and never got the notification.

Glad you found this post helpfull.  Sorry it is hard to follow. I have a completely different PH framework that I created, but wanted to help others at least get the sample running.

Ok. So now the help if you still need it.

I see the problem. Damn logic gets me everytime.

I was trying to provide a sample where you only had to change the one file. The problem is that it should be spooling through each item in each container then getting the next container. Also I am not checking for null in that line you mentioned. Here is the fix.

In accessor.cs:

if (m_bDirectory)

{

// Next Container or Item.

               m_CurrentText = m_ContentEnumerator.GetNextItem(); //spool out all items within containter first

               if (m_CurrentText == null || m_CurrentText == string.Empty) // No more items, try the next container.

               {

                   string dontcare = m_ContentEnumerator.GetNextContainer();

                   m_CurrentText = m_ContentEnumerator.GetNextItem(); //spool out all items within containter first                

               }

if (m_CurrentText != null && m_CurrentText.Length > 0) // If we still have an item, process it.

{

pStat.idChunk = ++m_ulChunkId;

pStat.breakType = CHUNK_BREAKTYPE.CHUNK_EOS;

pStat.flags = CHUNKSTATE.CHUNK_TEXT;

pStat.locale = m_lcid;

pStat.attribute.guidPropSet = SearchProtocolGuids.IID_GathererPropset;

pStat.attribute.psProperty.ulKind = PSKIND.PROPID;

pStat.attribute.psProperty.data.propid = (uint)GathererProperties.PID_GTHR_DIRLINK;

pStat.idChunkSource = 0;

pStat.cwcStartSource = 0;

pStat.cwcLenSource = 0;

hr = (IFilterReturnCodes)SCODE.S_OK;

}

}

And in contentenumerator.cs:

public string GetNextContainer()

       {

            if (m_ContainerEnumerator != null)

           {

               this.m_bValidContainer = m_ContainerEnumerator.MoveNext();

               if (this.m_bValidContainer)

                   this.m_ItemEnumerator = ((System.Data.DataTable)m_ContainerEnumerator.Current).Rows.GetEnumerator();

               else

                   m_ItemEnumerator = null;

           }

           else

           {

               Logging.Warning("Logic error, trying to GetNextContainer without an enumerator");

           }

           return string.Empty;

       }

I will upload both now and change this main posting.

Tuesday, August 28, 2007 6:16 AM by Minu

# re: ContentEnumerator.cs for stored procedure sample

Hi Christ,

   How are you? I need a small help we implemented the John's Code. Crawling is done and also able to get the search results but these are some of the  problem we are facing

1. The Method or operation not implemented is displayed in the Crawl log for that content source.

2. If we recrawl the contentsource multiple items are displayed in the search results? i.e I crawled the content source for the first time and again if I do full crawl and tried to do search multiple items with same results are getting displayed?

I tried to figure out where problem lies. But couldn't. Please provide me a solution . It will be a great help.

Thanks in Advance

Minu

Tuesday, August 28, 2007 8:39 AM by notorioustech

# re: ContentEnumerator.cs for stored procedure sample

Hi,

For number 1, I would need to debug it. Sorry.

For number 2. That can happen if the href property is different in any way for the same item. Maybe you changed it to have it point to a different page when users click on the result. You have to do a content reset of the index and a recrawl after you change that.

- Hope this helps.

Wednesday, August 29, 2007 6:59 AM by Minu

# re: ContentEnumerator.cs for stored procedure sample

Hi,

     In the href property    

// HREF display text.

                   m_Properties.Add(new ChunkInfo(SearchProtocolGuids.PSGUID_DAV, "dav:displayname", m_ContentEnumerator.Title)); in the Accessor.cs class  so I am passing  the file name as Title and I am changing only launch url?

Do we need to check anything for LastmodifiedDate. If so I took the lastwritetime for the file and passing it to the lastmodifieddate.

please help to find a solution. Except you no one is expert in handling protocol handlers.

Thanks

Minu

Wednesday, August 29, 2007 9:17 AM by notorioustech

# re: ContentEnumerator.cs for stored procedure sample

Thanks for the compliment. I think I am the only one trying to help people, not the only expert.

That is not the href property just the title. This is the URL or href property.

m_Properties.Add(new ChunkInfo(SearchProtocolGuids.PSGUID_QUERY, 9, m_ContentEnumerator.LaunchUrl));

the other URL involved is the actual custom URI you generated. In my sample it is in the the ContentEnumerator.cs class:

                    currentItem = PHCODE + "://" + m_server + "/" + m_dbname + "/" + m_enumstoredProc + "/" + m_itemstoredProc + "/fake.myp?ID=" + id;

if this is different in any way, even though it is the same item, it will cause the duplicates.

It would help me if you would consider writing a blog post or article about your experiences with this.

-Regards

Friday, August 31, 2007 2:47 AM by Minu

# re: ContentEnumerator.cs for stored procedure sample

Christ,

      I didn't find any difference in this url. May be I couldn't figure out what might differ?

Coming to writing blog once my protocol handler is up I will free from that time so I can post my experiences.

Thanks in Advance

Minu

Monday, September 03, 2007 3:23 AM by Minu

# re: ContentEnumerator.cs for stored procedure sample

christ,

         How to keep a check that it should create duplicates?

Thanks in Advance

Minu

Monday, September 03, 2007 3:51 AM by notorioustech

# re: ContentEnumerator.cs for stored procedure sample

Not sure what you are asking. Let's move this to private email and will post an update on whatever resolution we come up with. Send me your email address through the contact link on this blog.

Tuesday, September 04, 2007 4:26 AM by Jernej

# re: ContentEnumerator.cs for stored procedure sample

Hi,

can you help me with security descriptor code? I tried something with CommonSecurityDescriptor class and Win32.Security library but I was unsuccessful.

Thanks in advance

Jernej

Tuesday, September 04, 2007 5:03 AM by notorioustech

# re: ContentEnumerator.cs for stored procedure sample

Hi Jernej,

Can you register as a user on the SharePoint search site. That way I can private email you when needed.

The code in the sample does work for security works for simple cases, what are you trying to do that is different?

Are you getting any exceptions thrown?

As a test create a windows form project and try calling the PHSecurityDescriptor  methods in a controlled environment. Verify that your domain and user names are valid and accessible.

Wednesday, September 05, 2007 6:09 AM by jernej

# re: ContentEnumerator.cs for stored procedure sample

Hi,

Thanks for advice. I removed all invalid user names and it started working.

However, I still got one problem. I can't change document icon on a result page. I tried it via ContentType but it didn't work.

Thanks,

jernej

Wednesday, September 05, 2007 11:38 AM by notorioustech

# re: ContentEnumerator.cs for stored procedure sample

Yeah content type won't work, it goes by the extension on the href url. Ontolica my check both. I know I have had conversations with Lars about this very thing. You can take another approach and alter the xslt in the search results web part to do show the icons based on whatever you want. I usually add a custom metadata property to identify file types that I refer to in the xslt to determine which icon if the href is going to be a dynamic destination like a .aspx page.

Thursday, September 06, 2007 5:19 AM by Minu

# re: ContentEnumerator.cs for stored procedure sample

christ,

        How to keep a check on the indexing server that it should not contain duplicates for an item. If not is there any possiblity in PH Code.

Thanks in Advance

Minu

Thursday, September 06, 2007 11:05 AM by notorioustech

# re: ContentEnumerator.cs for stored procedure sample

Not really an easy way to do it in the PH code given the nature of the crawling process.

But I am confused.

I thinkyou are asking the wrong question. You have two URLs involved here. The crawling URL and the HREF url that end users get. Both of these URLs should be a unque 1 to 1 match to your source systems item. For example if you are crawling a CRM systems customers then you would first need to determine the unique ID of that customer, if it is a combination of keys that make that customer record unique then you need to use all of them in the URLs like ?id1=5&id2=6.

Tuesday, September 11, 2007 5:58 PM by Christopher on SharePoint Search

# MOSS Enterprise or Standard Edition? Effects on Search

Here is the latest pricing from Microsoft office.microsoft.com/.../FX102176831033

Friday, September 21, 2007 2:46 AM by Christopher on SharePoint Search

# How to make Search Results Actionable

This question from your clients will always come up. You have just integrated their LOB system into SharePoint

Sunday, December 09, 2007 5:00 PM by Cipri

# re: ContentEnumerator.cs for stored procedure sample

Hi,

I tried your PH, but I got the following error: "Object reference is not set to an instance of an object." Can you please give me a hint? Did I miss something?

Thank you,

Cipri

Sunday, December 09, 2007 10:36 PM by notorioustech

# re: ContentEnumerator.cs for stored procedure sample

Can you step through it and tell me what line you are getting it on? I know of at least 2 people who have it running ok.

Monday, January 14, 2008 4:14 AM by Cipri

# re: ContentEnumerator.cs for stored procedure sample

Hi there,

With your help I also implemented my protocol handler. Thanks.

Now I'm looking to improve it. I would really appreciate if somebody can share some benchmarks with me.

Cheers,

Cipri

Monday, January 14, 2008 9:32 AM by notorioustech

# re: ContentEnumerator.cs for stored procedure sample

Well my product on http://sharepointworks.com achieves up to 200 items/second during regular crawling and incrementally will spool through up to 10K/second. This is inherently more complex than the starter crawler but has some basis in it so you should be able to do this or better.

Friday, February 22, 2008 7:52 PM by rvm@fr.com

# re: ContentEnumerator.cs for stored procedure sample

Christopher,

I am hoping to use this as a starting point for a custom protocol handler that will read a database to gather meta data and file locations to include in our searches.

I am unable to compile the project because of the missing SecurityDescriptor library. Can you tell me where to get the appropriate assemblies?

Monday, March 03, 2008 4:57 PM by lazedo

# re: ContentEnumerator.cs for stored procedure sample

Hi, you said

"...I usually add a custom metadata property to identify file types that I refer to in the xslt to determine which iconproperty" to distiguish the ..."

how do we create custom metadata properties ?

i see the where the "normal" properties are added like "m_Properties.Add(new ChunkInfo(SearchProtocolGuids.PSGUID_QUERY, 9, m_ContentEnumerator.LaunchUrl));"

i want to create my property called ProductID, and then in sharepoint have it mapped to a managed property.

TIA

Wednesday, March 05, 2008 4:38 PM by notorioustech

# re: ContentEnumerator.cs for stored procedure sample

first create a new guid constant

Guid MYGUID;

m_Properties.Add(new ChunkInfo(MYGUID, "ProductID", "myproduct id here"));

when you run it you will get a crawled property and then you can create a managed property and recrawl.