June 2007 - Posts

Here is a new version of ContentEnumerator.cs and Accessor.cs based off of John Kohz's PH c# sample to show you how you might utilize a custom protocol handler to crawl a database.

Just replace the origninal file in his sample code with this version and start from there. 

This compiles but is not tested. I based it off of work I have done in the past that does work to give you an idea of some of the concepts. If you have any questions just ask and I will try to clarify or fix anything in this file.

One glaring omission for this sample is the ability to add custom properties to the index besides the basic Title and URL properties. It is not a far reach to figure out how though from I have provided.

More to come on the tricky stuff I discussed in my last post.

Snippets from the code:

public ContentEnumerator(string sUrl): base(sUrl)
{

 if (sUrl.ToLower().Contains("fake.txt?id="))
  m_type = UriType.Item;
 else if (sUrl.ToLower().Contains("fake.txt"))
  m_type = UriType.Container;
 else
  throw new ApplicationException("Unknown content type");
 
 string tURL = sUrl.Substring(sUrl.IndexOf("//") + 2); // trim off the excess
 string[] aURL = tURL.Split("/".ToCharArray());
 m_server = aURL[0];
 m_dbname = aURL[1];
 m_enumstoredProc = aURL[2];
 m_itemstoredProc = aURL[3];
 string[] parms = this.Query.Split("&".ToCharArray());
 Hashtable ht = new Hashtable();
 for (int i = 0; i < parms.Length; i++)
 {
  string[] brk = parms[i].Split("=".ToCharArray());ht[Microsoft.JScript.GlobalObject.decodeURI(brk[0]).ToLower()] = Microsoft.JScript.GlobalObject.decodeURI(brk[1]);
 }

 if (Type == UriType.Item)
 {
  m_itemID = ht["id"].ToString();
  m_itemType = ht["type"].ToString();
 }

 Logging.Trace("ContentEnumerator created");Logging.Information("Creating ContentEnumerator for: " + sUrl);
}

public void EnsureContent()
{

 //need to initialize a stream here.

 String contstr = m_dsContent.Tables[0].Rows[0]["CONTENT"].ToString(); // in this case your Stored proc combines all your searchable data into one field.

 byte[] byteArray; UnicodeEncoding uniEncoding = new UnicodeEncoding();
 // Create the data to write to the stream.

 byte[] firstString = uniEncoding.GetBytes(contstr);m_Stream = new MemoryStream(firstString);
}

public byte[] SecurityDescriptor
{

 get

 {
  if (m_SecurityDescriptor == null)
  {
   DateTime startTime = Logging.Enter(typeof(ContentEnumerator), "SecurityDescriptor-create");
   // lets say for the our example here that the stored proc will return a list of users that

   //are allowed to access the items and they are in domain\user format already :)
   String owner = m_dsContent.Tables[0].Rows[0]["OWNER"].ToString(); ArrayList al = new ArrayList();
   if (m_dsContent.Tables.Count == 2) // second one is security list
    for (int i = 0;i<m_dsContent.Tables[1].Rows.Count;i++)
     al.Add (m_dsContent.Tables[1].Rows[i]["USER"]);

   String[] users = (String[]) al.ToArray (typeof (String));
   // need to include the Win32.Security libary. Do a search for it on Google
   PHSecurityDescriptor sec = new PHSecurityDescriptor();
   sec.LoadSecurity(owner,users);
   m_SecurityDescriptor = sec.GetBuffer();
  }

  return m_SecurityDescriptor;
 }
}

See the file link at the top of this post for the rest

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

As mentioned in a previous post there are limitations involved in using the BDC to crawl and make searchable all of your Enterprise data. Some of these limitations are deal breakers and you must turn to an alternate solution for searching your data. Some of the larger systems have custom protocol handlers (PH) already developed either by the vendor or a third party. These can be easily located through our vendor list on the front of the www.sharepointsearch.com site.

Vorsite supports Documentum and FileNet, XMLAW and SHAREPOINT Works support Worksite and Hummingbird, and HandShake can connect to most legal systems, there are more being developed every day.

BUT. If there isn't a good PH available for your system or it's a proprietary solution and you have good COM or DotNet Interop developers available then you can take a crack at writing your own protocol handler.

Here are some of the steps involved to get started:

  • Read the attached word document that is extracted from John's ZIP it will prepare you for what is to come. I won't go into all the details of what a PH is and how it is used, This document does a great job of explaining this.
  • Once you have the solution loaded in Visual Studio 2005 open up the ContentEnumerator.cs class. This is your starting point. John did a good job of providing you with a single class to focus on in order to get your PH working. It is very much fill in the blanks.
  • you will need to figure out the following and it is going to be different for every system
    • uri - When writing a PH the primary input is going to be the content source URI. It can contain any information you desire as long as it's formatted into URI form. I personally use it to pass in connection and query information so that my PH can be flexible and crawl multiple things. There typically are two types of URIs, one that points to an enumerator and one that points to the actual content. When the PH gets passed in an enum URI it will spit out content URI's which in turn will get passed into new calls to the PH to return the data. Read that again a few times till it makes sense. Your URI must always end in a valid extension or at least one that you have added to the allowed extensions list otherwise the gatherer will exclude it. I use the uri to pass to my PH where my database is located, the database name, the stored proceedure to use for enumerating content and which stored proceedure to use to get the data for the item to be added to the index. Here are examples of both  myph://servername/dbname/enumproc/fake.enu  and myph://servername/dbname/contentproc/fake.cnt?ID=55 . From these I can determine what each is used for and get the information I need to crawl the content. You will of course have to write the stored procs and have them return some predefined dataset columns that your PH is expecting.

Will upload a modified version of ContentEnumerator.cs later today that will work for Stored Proceedures.

 

Advanced Topics: I will be posting code bits and discussing in the future each of these

  • Loading ifilters for file types inside your PH in order to extract unstructured data from files.
  • Security - constructing advanced ACLs including GROUPs
  • Indexing complex property datatypes: arrays, floats, dates, integers. The current sample only allows strings. The tricky part here is defining the COM interop equivelant of a PROPVARIANT.
  • how to implement a changelog in your PH for optimized incremental crawls
  • how to use the new iSecurityTrimmer interface as an adjunct to your security model
  • how to fix the darn memory leak in COM interop that is inherent in the sample.
  • dealing with threading issues and caching.
Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

If you haven't checked out CodePlex yet you should right now.

http://www.codeplex.com/Project/ProjectDirectory.aspx?ProjectSearchText=sharepoint

It is very interesting all the projects in development.

Some won't end up going far but some of the more promising ones are the

 Community Kit for SharePoint which seeks to bridge the gaps between SP and a real community site. We still prefer CommunityServer.org integrated in, but will keep an eye on this project to see how it progresses

Stramit SharePoint 2007 Caml Viewer a very usefull tool.

Stramit SharePoint 2007 Master Picker  as an alternative to recursive SharePoint Designer method

 
Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

The BDC is the perfect choice for indexing and working with small datasources with minimal security requirements and only structured data. But as soon as you dealing with unstructured data like documents in combination with metadata like a document management system contains you are out of luck and now have to write your own protocol handler or hope a vendor comes out with one.

Security and the BDC
The two uses of the BDC have two different security approaches:

Search Content Source - When using the BDC as a search content source you can't have any security specifically on the crawled data like you can get with your own protocol handler With a PH you actually create AD based ACLs that are attached to each item and applied by SharePoint before the results are returned. With the BDC in order to provide any security on there results you have to use the new Security trimming feature and write a query to verify access to the data. This is a severly limited approach as it queries for each and every item and is applied after the results are generated effecting relevancy. If you are dealing with large result sets and heavily secured data then you may have to spool through thousands of results just to get enough to show to the user if any.

Web part data source - this is when you create BDC data definitions to display data in web parts from your LOB systems. The security for this requires you to include all your security in each and every query or web service call. If you mapping to an API this may be already taken care of for you but if not be prepared to deal with large queries or stored proceedures especially if your LOB system has complex security models with groups or roles or even worse hierarchy based.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks
Posted by ceven | with no comments

While playing around with the SharePoint object model and web services I decided that I wanted to try writing a Custom Authentication provider that uses a sharepoint list as its source.
The reasoning for this is to be able to use one site to control access to another.

Anyways I hope this source helps out any dev's out there trying to something similar:
I have a whole C# project with a readme file and everything.
http://sharepointsearch.com/cs/blogs/notorioustech/SPWorks.Freeware.zip

It is not fully functional in that I haven't implemented all of the provider interfaces but it authenticates fine.



Here are some key pieces

public override bool ValidateUser(string strName, string strPassword)
{
 bool boolReturn = false;
 if (this._debugMode) writeToLog("ValidateUser:" + strName);
 bool userIsOnline = false;
 try
 {
  MembershipUser mu = this.GetUser(strName, userIsOnline);
  if (mu == null)
  return boolReturn;
  return strPassword == mu.GetPassword();
 }
 catch (Exception ee )
 {
  writeToLog("ValidateUser exception:" + ee.ToString ());
  return false;
 }
}

public override MembershipUser GetUser(string strName, bool boolUserIsOnline)
{
 if (this._debugMode) writeToLog("GetUser");
 try
 {
  XmlNode res = this.RunListQuery(false,"Text", "Title", strName,100);
  if (res.ChildNodes.Count == 0)
   return null;
  if (res.FirstChild.ChildNodes.Count == 0)
  return null;
  if (res.FirstChild.ChildNodes.Count > 1)
   throw new ApplicationException("User ID:" + strName + " matches more than one result!!");
  return GetMemberFromNode(res.FirstChild.FirstChild );
 }
 catch (Exception ee)
 {
  writeToLog("Exception:GetUser:" + ee.Message);
  throw;
 }
}

private MembershipUser GetMemberFromNode(XmlNode nd)
{
 String id = nd.Attributes.GetNamedItem("ows_ID").Value.ToString();
 String email = nd.Attributes.GetNamedItem("ows_Email").Value.ToString();
 String pwdQ = nd.Attributes.GetNamedItem("ows_Password_x0020_Question").Value.ToString();
 String pwdA = nd.Attributes.GetNamedItem("ows_Password_x0020_Answer").Value.ToString();
 String pwdChange = nd.Attributes.GetNamedItem("ows_Require_x0020_Password_x0020_Cha").Value.ToString();
 String passWd = nd.Attributes.GetNamedItem("ows_Password").Value.ToString();
 // String siteName = nd.Attributes.GetNamedItem("ows_Site_x0020_Name").Value.ToString();
 Boolean isLocked = true;
 if (nd.Attributes.GetNamedItem("ows_Is_x0020_Locked_x0020_Out").Value.ToString()=="0") isLocked = false;
 String title = nd.Attributes.GetNamedItem("ows_Title").Value.ToString();
 DateTime createDate = DateTime.Parse(nd.Attributes.GetNamedItem("ows_Created").Value.ToString());
 DateTime lastLoginDate = new DateTime (1980,1,1);
 DateTime firstLoginDate = new DateTime (1980,1,1);
 DateTime lastLockoutDate = new DateTime(1980, 1, 1);
 DateTime lastPasswordChangeDate = new DateTime(1980, 1, 1);
 if (nd.Attributes.GetNamedItem("ows_Last_x0020_Login") != null)
  firstLoginDate = DateTime.Parse(nd.Attributes.GetNamedItem("ows_First_x0020_Login").Value.ToString());
 if (nd.Attributes.GetNamedItem("ows_Last_x0020_Login") != null)
  lastLoginDate = DateTime.Parse(nd.Attributes.GetNamedItem("ows_Last_x0020_Login").Value.ToString());
 if (nd.Attributes.GetNamedItem("ows_Last_x0020_Lockout") != null)
  lastLockoutDate = DateTime.Parse(nd.Attributes.GetNamedItem("ows_Last_x0020_Lockout").Value.ToString());
 if (nd.Attributes.GetNamedItem("ows_Last_x0020_Password_x0020_Change") != null)
  lastPasswordChangeDate = DateTime.Parse(nd.Attributes.GetNamedItem("ows_Last_x0020_Password_x0020_Change").Value.ToString());
 SPWorksMembershipUser mem = new SPWorksMembershipUser(_strName, title, id, email, pwdQ,
"", true, isLocked, createDate, lastLoginDate, lastLoginDate,
lastPasswordChangeDate, lastLockoutDate, passWd, pwdA);
 return mem;
}

private XmlNode RunListQuery(Boolean asLike,String fieldType,String field, String val, int maxRows)
{
 if (this._debugMode) writeToLog("RunListQuery");
  MemberListService.Lists listService = new MemberListService.Lists();
 listService.Credentials = new System.Net.NetworkCredential(this._runAsUsername, this._runAsPassword, this._runAsDomain);//_runAsUsername, _runAsPassword);
 // Set the Url property of the service for the path to a subsite.
 listService.Url = _splistpath; //"
http://Server_Name/Subsite_Name/_vti_bin/Lists.asmx";
 // Instantiate an XmlDocument object
 System.Xml.XmlDocument xmlDoc = new System.Xml.XmlDocument();
 /*Use the CreateElement method of the document object to create elements for the parameters that use XML*/
 System.Xml.XmlElement query = xmlDoc.CreateElement("Query");
 System.Xml.XmlElement ndQueryOptions = xmlDoc.CreateElement("QueryOptions");
 System.Xml.XmlElement ndUserName = xmlDoc.CreateElement("username");
 ndUserName.InnerText = val;
 if (fieldType == "")
  fieldType = "Text";
 if (asLike && !field.Equals(""))
 {
  query.InnerXml = "" + "" + ndUserName.InnerXml + "";
 }
 else if (!field.Equals (""))
 {
  query.InnerXml = "" + "" + ndUserName.InnerXml + "";
 }
 if (this._debugMode) writeToLog(query.InnerXml);
  ndQueryOptions.InnerXml = "FALSE" +"FALSEFALSE";
 /* Declare an XmlNode object and initialize it with the XML response from the GetListItems method.*/
 System.Xml.XmlNode nodeListItems = listService.GetListItems(this._listName, null, query, null,"100" , ndQueryOptions,null);
 if (nodeListItems == null) writeToLog("GetListItems Failed");
 if (this._debugMode) writeToLog(nodeListItems.InnerXml);
 // some simple cleanup needed
 nodeListItems.InnerXml = nodeListItems.InnerXml.Replace("\n ", "").Replace("\n","").Replace("> ",">");
 return nodeListItems;
}

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks
Posted by ceven | 3 comment(s)

I appologize now for the ramblings of this post but it is just following how my brain and thoughts tracked through the weekend. I will try to structure this as much as possible so it is usefull to others trying to do the same.

Problem:
Need to rapidly create and deploy multiple community sites with some niche functionality. Some requirements of these sites are:

  • Great FULL FEATURED Blogging Interface
  • Easy skinning or theming for end users
  • robust user and subscription management with self registration.
  • full forums
  • Vendor lists with comparison matrix
  • searchable
  • customizable user profiles
  • versioning and rollback
  • the usual articles, links, news, announcements interface
  • easy customizatioon in terms of adding new functionality ie. custom lists, actual code, db forms.
  • User rankings where more active users get points and credibility.

Some background:

In doing this search I wanted to be completely open minded, which is not easy if you know me. I am a .NET, SharePoint guy first and foremost. But I quickly realized when I began poking around that this was a handicap mentally. If I could find a solution that did it all and more then I owed it to myself (and clients) to explore it fully, whether MS based or open source.

The search:

Everyone should go to http://www.cmsmatrix.org for a truly enlighting experience. There are hundreds of CMS systems out there, many that are very mature and being used at thousands of sites successfully. This site is kind of overwhelming at first. It offers a search interface where you plug in the functionality you want and it feeds you back matching CMS systems. NOTE: sharepoint is poorly represented here so don't bother with it in any comparisons, just focus on the alternatives for now.

I will now save you the pain of what I went through in my hunt and jump straight to the top picks in my book from this site and from actually installing and trying some of them. I will leave SharePoint out of these picks for now as I am still thinking through how I can possibly make SharePoint do all that these products can do. Plus these are FREE

1. www.DotNetNuke.com (.NET)- if you are a MS guy then this is a safe choice as you will be dealing with familiar tools and architecture. This product is very robust and is ASP.NET 2.0 based. It has most everything in my list save the versioning and comparison matrix. I have used it in the past and know I can make this do what I need it to do with a little work.

2. http://www.webgui.org(perl) - This is truly an awsum piece of work. Very cool and if you don't mind open source then you have to try this out. Their admin interface is all web based and very easy to use. This tool has tons of functionality and is very mature, you can create your 24 hour test portal online at demo.plainblack.com try it out. This is written in PERL. I hate perl, but would be willing to suffer for all this functionality.

3. typo3.com (PHP) - this has more functionality than webgui.org. i am still evaluating this one, but can't discount it. there are some rumblings out there that it is harder to master, but complexity doesn't bother me as long as its worth it. stay tuned on this one I will follow up some more.

4. www.drupal.org -(PHP) recommended by many. review to come.

5. www.plone.org (python)- another highly recommended one - review to come

 

The indecision -

Ok now I have done it. A little knowledge can be dangerous. I think i was better off not knowing there were powerfull alternatives out there. Any road I pick will be a long one and I don't want to back up and try another exit. If I choose SharePoint or DotNetNuke I know I can code mods easily enough, but if I pick one of the others and they do everything I need them too then i don't have to code and anything.

What about SharePoint?

Well first off which one MOSS or WSS, MOSS is like $70K + for internet license and WSS is $8K. The differences between MOSS and WSS for community site functionality is small so it would be WSS. But SharePoint doesn't have a subscription /registration system so that would need to be developed, I already am part way there though. SharePoint's Forums and Blog interface won't cut it yet ( there is a community project for SharePoint http://www.codeplex.com/CKS but won't be done until next year) so I will have to develop a new one or integrate an external one. I have played with http://communityserver.org/ and it has blogging, forums, subscriptions and supports the .NEt authentication provider model. I am theorizing that I can integrate the community server functionality with SharePoint easily enough. I can finish my registration authentication plugin for SharePoint and have that control community server OR go the other way and create an auth provider for SharePoint that will support single signon from community server. For my needs community server will cost $2000 (easily worth it). This combined solution would still come up short in some features but would work for my needs.

So I am still leaning to SharPoint. Am I being a MS snob? If a SharePoint solution is settling why settle? Well MS is going to get there eventually and I don't want to have change platforms in the future. MS has the best build in APP building funtionality with their DM and custom list functionality. Oh and I know SharePoint very well.

The decision :

None Yet. Going to explore a combined SP and CommunityServer.org product, going to push dotnetnuke.org a little further, and going to play with PLOME, DRUPAL and TYPO3 just to be fair. STAY TUNED.

 
Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks