ContentEnumerator.cs for stored procedure sample

Here is a new version of ContentEnumerator.cs and Accessor.cs based off of John Kohz's PH c# sample to show you how you might utilize a custom protocol handler to crawl a database.

Just replace the origninal file in his sample code with this version and start from there. 

This compiles but is not tested. I based it off of work I have done in the past that does work to give you an idea of some of the concepts. If you have any questions just ask and I will try to clarify or fix anything in this file.

One glaring omission for this sample is the ability to add custom properties to the index besides the basic Title and URL properties. It is not a far reach to figure out how though from I have provided.

More to come on the tricky stuff I discussed in my last post.

Snippets from the code:

public ContentEnumerator(string sUrl): base(sUrl)
{

 if (sUrl.ToLower().Contains("fake.txt?id="))
  m_type = UriType.Item;
 else if (sUrl.ToLower().Contains("fake.txt"))
  m_type = UriType.Container;
 else
  throw new ApplicationException("Unknown content type");
 
 string tURL = sUrl.Substring(sUrl.IndexOf("//") + 2); // trim off the excess
 string[] aURL = tURL.Split("/".ToCharArray());
 m_server = aURL[0];
 m_dbname = aURL[1];
 m_enumstoredProc = aURL[2];
 m_itemstoredProc = aURL[3];
 string[] parms = this.Query.Split("&".ToCharArray());
 Hashtable ht = new Hashtable();
 for (int i = 0; i < parms.Length; i++)
 {
  string[] brk = parms[i].Split("=".ToCharArray());ht[Microsoft.JScript.GlobalObject.decodeURI(brk[0]).ToLower()] = Microsoft.JScript.GlobalObject.decodeURI(brk[1]);
 }

 if (Type == UriType.Item)
 {
  m_itemID = ht["id"].ToString();
  m_itemType = ht["type"].ToString();
 }

 Logging.Trace("ContentEnumerator created");Logging.Information("Creating ContentEnumerator for: " + sUrl);
}

public void EnsureContent()
{

 //need to initialize a stream here.

 String contstr = m_dsContent.Tables[0].Rows[0]["CONTENT"].ToString(); // in this case your Stored proc combines all your searchable data into one field.

 byte[] byteArray; UnicodeEncoding uniEncoding = new UnicodeEncoding();
 // Create the data to write to the stream.

 byte[] firstString = uniEncoding.GetBytes(contstr);m_Stream = new MemoryStream(firstString);
}

public byte[] SecurityDescriptor
{

 get

 {
  if (m_SecurityDescriptor == null)
  {
   DateTime startTime = Logging.Enter(typeof(ContentEnumerator), "SecurityDescriptor-create");
   // lets say for the our example here that the stored proc will return a list of users that

   //are allowed to access the items and they are in domain\user format already :)
   String owner = m_dsContent.Tables[0].Rows[0]["OWNER"].ToString(); ArrayList al = new ArrayList();
   if (m_dsContent.Tables.Count == 2) // second one is security list
    for (int i = 0;i<m_dsContent.Tables[1].Rows.Count;i++)
     al.Add (m_dsContent.Tables[1].Rows[i]["USER"]);

   String[] users = (String[]) al.ToArray (typeof (String));
   // need to include the Win32.Security libary. Do a search for it on Google
   PHSecurityDescriptor sec = new PHSecurityDescriptor();
   sec.LoadSecurity(owner,users);
   m_SecurityDescriptor = sec.GetBuffer();
  }

  return m_SecurityDescriptor;
 }
}

See the file link at the top of this post for the rest

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks
Published 19 June 2007 04:42 PM by notorioustech
Filed under: , ,

Comments

# notorioustech said on 02 July, 2007 10:47 AM

Oops. Sorry I had uploaded the .cs file thinking the server would let you download it. Fixed. The contentenumerator.cs can be downloaded now

# Links (7/16/2007) « Steve Pietrek’s SharePoint Stuff said on 16 July, 2007 07:57 PM

Pingback from  Links (7/16/2007) &laquo; Steve Pietrek&#8217;s SharePoint Stuff

# Shanti said on 27 July, 2007 03:27 AM

Hi Christ,

 I am trying to implement Proctol Handler. Your code helped me how to move on John's code.

while I am trying to set the m_ItemEnumerator as

this.m_ItemEnumerator = ((System.Data.DataTable)m_ContainerEnumerator.Current).Rows.GetEnumerator();

Its returning null. Can you help me why this error is coming ? and also how the execution of the code is done while the protocol Hanlder is running.

It will be a great help.

Thanks in Advance

Shanti

# notorioustech said on 08 August, 2007 06:24 PM

Sorry for the delay in this response. I forgot I had moderation turned on and never got the notification.

Glad you found this post helpfull.  Sorry it is hard to follow. I have a completely different PH framework that I created, but wanted to help others at least get the sample running.

Ok. So now the help if you still need it.

I see the problem. Damn logic gets me everytime.

I was trying to provide a sample where you only had to change the one file. The problem is that it should be spooling through each item in each container then getting the next container. Also I am not checking for null in that line you mentioned. Here is the fix.

In accessor.cs:

if (m_bDirectory)

{

// Next Container or Item.

               m_CurrentText = m_ContentEnumerator.GetNextItem(); //spool out all items within containter first

               if (m_CurrentText == null || m_CurrentText == string.Empty) // No more items, try the next container.

               {

                   string dontcare = m_ContentEnumerator.GetNextContainer();

                   m_CurrentText = m_ContentEnumerator.GetNextItem(); //spool out all items within containter first                

               }

if (m_CurrentText != null && m_CurrentText.Length > 0) // If we still have an item, process it.

{

pStat.idChunk = ++m_ulChunkId;

pStat.breakType = CHUNK_BREAKTYPE.CHUNK_EOS;

pStat.flags = CHUNKSTATE.CHUNK_TEXT;

pStat.locale = m_lcid;

pStat.attribute.guidPropSet = SearchProtocolGuids.IID_GathererPropset;

pStat.attribute.psProperty.ulKind = PSKIND.PROPID;

pStat.attribute.psProperty.data.propid = (uint)GathererProperties.PID_GTHR_DIRLINK;

pStat.idChunkSource = 0;

pStat.cwcStartSource = 0;

pStat.cwcLenSource = 0;

hr = (IFilterReturnCodes)SCODE.S_OK;

}

}

And in contentenumerator.cs:

public string GetNextContainer()

       {

            if (m_ContainerEnumerator != null)

           {

               this.m_bValidContainer = m_ContainerEnumerator.MoveNext();

               if (this.m_bValidContainer)

                   this.m_ItemEnumerator = ((System.Data.DataTable)m_ContainerEnumerator.Current).Rows.GetEnumerator();

               else

                   m_ItemEnumerator = null;

           }

           else

           {

               Logging.Warning("Logic error, trying to GetNextContainer without an enumerator");

           }

           return string.Empty;

       }

I will upload both now and change this main posting.

# Minu said on 28 August, 2007 06:16 AM

Hi Christ,

   How are you? I need a small help we implemented the John's Code. Crawling is done and also able to get the search results but these are some of the  problem we are facing

1. The Method or operation not implemented is displayed in the Crawl log for that content source.

2. If we recrawl the contentsource multiple items are displayed in the search results? i.e I crawled the content source for the first time and again if I do full crawl and tried to do search multiple items with same results are getting displayed?

I tried to figure out where problem lies. But couldn't. Please provide me a solution . It will be a great help.

Thanks in Advance

Minu

# notorioustech said on 28 August, 2007 08:39 AM

Hi,

For number 1, I would need to debug it. Sorry.

For number 2. That can happen if the href property is different in any way for the same item. Maybe you changed it to have it point to a different page when users click on the result. You have to do a content reset of the index and a recrawl after you change that.

- Hope this helps.

# Minu said on 29 August, 2007 06:59 AM

Hi,

     In the href property    

// HREF display text.

                   m_Properties.Add(new ChunkInfo(SearchProtocolGuids.PSGUID_DAV, "dav:displayname", m_ContentEnumerator.Title)); in the Accessor.cs class  so I am passing  the file name as Title and I am changing only launch url?

Do we need to check anything for LastmodifiedDate. If so I took the lastwritetime for the file and passing it to the lastmodifieddate.

please help to find a solution. Except you no one is expert in handling protocol handlers.

Thanks

Minu

# notorioustech said on 29 August, 2007 09:17 AM

Thanks for the compliment. I think I am the only one trying to help people, not the only expert.

That is not the href property just the title. This is the URL or href property.

m_Properties.Add(new ChunkInfo(SearchProtocolGuids.PSGUID_QUERY, 9, m_ContentEnumerator.LaunchUrl));

the other URL involved is the actual custom URI you generated. In my sample it is in the the ContentEnumerator.cs class:

                    currentItem = PHCODE + "://" + m_server + "/" + m_dbname + "/" + m_enumstoredProc + "/" + m_itemstoredProc + "/fake.myp?ID=" + id;

if this is different in any way, even though it is the same item, it will cause the duplicates.

It would help me if you would consider writing a blog post or article about your experiences with this.

-Regards

# Minu said on 31 August, 2007 02:47 AM

Christ,

      I didn't find any difference in this url. May be I couldn't figure out what might differ?

Coming to writing blog once my protocol handler is up I will free from that time so I can post my experiences.

Thanks in Advance

Minu

# Minu said on 03 September, 2007 03:23 AM

christ,

         How to keep a check that it should create duplicates?

Thanks in Advance

Minu

# notorioustech said on 03 September, 2007 03:51 AM

Not sure what you are asking. Let's move this to private email and will post an update on whatever resolution we come up with. Send me your email address through the contact link on this blog.

# Jernej said on 04 September, 2007 04:26 AM

Hi,

can you help me with security descriptor code? I tried something with CommonSecurityDescriptor class and Win32.Security library but I was unsuccessful.

Thanks in advance

Jernej

# notorioustech said on 04 September, 2007 05:03 AM

Hi Jernej,

Can you register as a user on the SharePoint search site. That way I can private email you when needed.

The code in the sample does work for security works for simple cases, what are you trying to do that is different?

Are you getting any exceptions thrown?

As a test create a windows form project and try calling the PHSecurityDescriptor  methods in a controlled environment. Verify that your domain and user names are valid and accessible.

# jernej said on 05 September, 2007 06:09 AM

Hi,

Thanks for advice. I removed all invalid user names and it started working.

However, I still got one problem. I can't change document icon on a result page. I tried it via ContentType but it didn't work.

Thanks,

jernej

# notorioustech said on 05 September, 2007 11:38 AM

Yeah content type won't work, it goes by the extension on the href url. Ontolica my check both. I know I have had conversations with Lars about this very thing. You can take another approach and alter the xslt in the search results web part to do show the icons based on whatever you want. I usually add a custom metadata property to identify file types that I refer to in the xslt to determine which icon if the href is going to be a dynamic destination like a .aspx page.

# Minu said on 06 September, 2007 05:19 AM

christ,

        How to keep a check on the indexing server that it should not contain duplicates for an item. If not is there any possiblity in PH Code.

Thanks in Advance

Minu

# notorioustech said on 06 September, 2007 11:05 AM

Not really an easy way to do it in the PH code given the nature of the crawling process.

But I am confused.

I thinkyou are asking the wrong question. You have two URLs involved here. The crawling URL and the HREF url that end users get. Both of these URLs should be a unque 1 to 1 match to your source systems item. For example if you are crawling a CRM systems customers then you would first need to determine the unique ID of that customer, if it is a combination of keys that make that customer record unique then you need to use all of them in the URLs like ?id1=5&id2=6.

# Christopher on SharePoint Search said on 11 September, 2007 05:58 PM

Here is the latest pricing from Microsoft office.microsoft.com/.../FX102176831033

# Christopher on SharePoint Search said on 21 September, 2007 02:46 AM

This question from your clients will always come up. You have just integrated their LOB system into SharePoint

# Cipri said on 09 December, 2007 05:00 PM

Hi,

I tried your PH, but I got the following error: "Object reference is not set to an instance of an object." Can you please give me a hint? Did I miss something?

Thank you,

Cipri

# notorioustech said on 09 December, 2007 10:36 PM

Can you step through it and tell me what line you are getting it on? I know of at least 2 people who have it running ok.

# Cipri said on 14 January, 2008 04:14 AM

Hi there,

With your help I also implemented my protocol handler. Thanks.

Now I'm looking to improve it. I would really appreciate if somebody can share some benchmarks with me.

Cheers,

Cipri

# notorioustech said on 14 January, 2008 09:32 AM

Well my product on http://sharepointworks.com achieves up to 200 items/second during regular crawling and incrementally will spool through up to 10K/second. This is inherently more complex than the starter crawler but has some basis in it so you should be able to do this or better.

# rvm@fr.com said on 22 February, 2008 07:52 PM

Christopher,

I am hoping to use this as a starting point for a custom protocol handler that will read a database to gather meta data and file locations to include in our searches.

I am unable to compile the project because of the missing SecurityDescriptor library. Can you tell me where to get the appropriate assemblies?

# lazedo said on 03 March, 2008 04:57 PM

Hi, you said

"...I usually add a custom metadata property to identify file types that I refer to in the xslt to determine which iconproperty" to distiguish the ..."

how do we create custom metadata properties ?

i see the where the "normal" properties are added like "m_Properties.Add(new ChunkInfo(SearchProtocolGuids.PSGUID_QUERY, 9, m_ContentEnumerator.LaunchUrl));"

i want to create my property called ProductID, and then in sharepoint have it mapped to a managed property.

TIA

# notorioustech said on 05 March, 2008 04:38 PM

first create a new guid constant

Guid MYGUID;

m_Properties.Add(new ChunkInfo(MYGUID, "ProductID", "myproduct id here"));

when you run it you will get a crawled property and then you can create a managed property and recrawl.