It seems everybody is moving to the cloud with their Enterprise applications. Some of the most popular systems are available as remote hosted solutions: SAP(mysap.com), Salesforce.com, MS Exchange Online, MS CRM etc. But as companies start to migrate to these remote hosted best of breed systems they start to face challenges with data visibility aka Knowledge Management. Searching and Reporting are critical tools for the the enterprise but if your data is no longer centralized how can you get a complete picture of all your data.
Enterprise Reporting has always been a critical part of business, being able to run reports and analyize data across multiple systems whether ad hoc or OLAP is critical for making reliable business decisions. But now building that data wharehouse is not so simple, you can't just import directly from your local databases of each of your systems. Hope is not lost though, I have researched many of those systems and there usually is a partner or two with an offering to provide that data extraction automatically for your reporting purposes. I would be curious to find out if anyone has build a centralized cloud based reporting solution that includes connectors for those system like SAP and Salesforce.com, if not i think i have my next startup company plan.
Anyways, this is a blog about search so let's focus on that. If you are hosting SharePoint internally (or FAST) for your Enterprise Search solution then how do you continue to maintain a centralized search interface when your data isn't so centralized anymore. The first option that comes to mind is Federated Search.
There has been alot of confusion about what Federated search means so let's clear that one up first. Federated Search is simply the combining of search results in a single interface from multiple search engines, for SharePoint that means when you run a query you get your regular search results in one webpart and completely seperate set from your Federated sources in another. In order for Federated search to work, either the vendor has to offer a standard Federatable (is that word?) interface or your have to write your own. There is possibly a second psuedo federated option to be considered depending on the remote systems search interface. Adding an iframe to host the remote search interface and some javascript to pass the search parameters has worked for me for some systems (sharepointsearch.com uses that model actually for it's search) and may provide what you need.
While i am at it, i might as well list the pros/cons of Federating search interfaces
Federated Search Pros:
- Security can be handled by the remote system so no translation is required into your native search system (if you can resolve your single sign on issues)
- No delays on index updates so search results are usually fresh
- Minimizes network and internet bandwidth as the data doesn't have to be extracted out
Cons
- Reliance on a Single Sign On model usually which adds an additional step before end users can search.
- Loss of integrated features like Facetted navigation, customizable search results, and advanced searching
- Complete loss of the relevancy capabilies of your search engine, your data is not actually in the index
You will have to be the judge on whether the loss of some search features is enough to justify the third option, index your cloud data with your Enterprise search engine. How you ask? Here are your options:
- If you are already extracting out the data for reporting purposes into a datawarehouse most search engines provide a way to crawl sql data ( or there are vendors to help. See sharepointsearch.com for the list).
- Write your own connector (see prior blog posts about how) to the remote systems APIs
- Look to a third party vendor (again see sharepointsearch.com for a complete list).
NOTE: For Exchange, MS CRM, Salesforce.com/force.com and SAP Cloud systems see BA-Insight for SharePoint and now FAST. (shameless plug)
Powered by Zoundry Raven
Technorati : cloud enterprise search fast sharepoint knowlege management sap salesforce.com
Del.icio.us : cloud enterprise search fast sharepoint knowlege management sap salesforce.com
Zooomr : cloud enterprise search fast sharepoint knowlege management sap salesforce.com
Flickr : cloud enterprise search fast sharepoint knowlege management sap salesforce.com
Del.icio.us |
Digg It |
Technorati |
Blinklist |
Furl |
reddit |
DotNetKicks
I don't know how i missed this post from a couple weeks ago, but asking around nobody seemed to have caught it so here it is.
http://blogs.msdn.com/sharepoint/archive/2009/05/21/attention-important-information-on-service-pack-2.aspx
Here is an excerpt
During the installation of SP2, a product expiration date is improperly activated. This means SharePoint will expire as though it was a trial installation 180 days after SP2 is deployed. The activation of the expiration date will not affect the normal function of SharePoint up until the expiration date passes. Furthermore, product expiration 180 days after SP2 installation will not affect customer’s data, configuration or application code but will render SharePoint inaccessible for end-users.
Here is the KB article on it: http://support.microsoft.com/kb/971620
Del.icio.us |
Digg It |
Technorati |
Blinklist |
Furl |
reddit |
DotNetKicks
The issue appears when you use host headers in IIS to distinguish between your sites. If you try to open your NTLM based site locally it will return a 401 error after 3 tries to log in and any SharePoint webservices calls will fail also locally.
After installing IE 8 (or rather letting auto update run) I found that my SharePoint webservices calls were no longer working. I have webparts and sites that do webservice calls back to the same machine and have worked for 2 years without problems.
Here is the fix: (http://support.microsoft.com/kb/896861)
Method 2: Disable the loopback check
Follow these steps:
- Click Start, click Run, type regedit, and then click OK.
- In Registry Editor, locate and then click the following registry key:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa
- Right-click Lsa, point to New, and then click DWORD Value.
- Type DisableLoopbackCheck, and then press ENTER.
- Right-click DisableLoopbackCheck, and then click Modify.
- In the Value data box, type 1, and then click OK.
- Quit Registry Editor, and then restart your computer.
From the comments, it appears it may have been caused by one of the updates that happened at the same time.
Del.icio.us |
Digg It |
Technorati |
Blinklist |
Furl |
reddit |
DotNetKicks
Enterprise Search means being able to search across all your enterprise systems, but that means you need to find a way to extract out that data and get it into your search engine.
One option is to use a 3rd party connector solution like BA-Insight's Longitude Enterpise Connectors.
See how BA-Insight can index any Enterprise System with Full Security mapping, you will also see a demo of their Longitude Search for SharePoint which provids facetted navigation, real time previews and more.
View the webinar recorded on April 17th:
https://www2.gotomeeting.com/register/588636416
Del.icio.us |
Digg It |
Technorati |
Blinklist |
Furl |
reddit |
DotNetKicks
SharePoint lists and document libraries are not designed to handle large numbers of items, in fact as soon as you add more than a few thousand you will see a warning within the List Settings Page that says:
As part of my product suite I consistently have lists with hundreds of thousands of items in them and have learned many valuable lessons in making these kind of numbers workable.
Hopefully some of what i learned can help you.
1. First off do not use the DataView webpart (from the SharePoint designer) or any xml based rendering webpart as they first grab all the items in the list and then apply the XLST to render and you will face both memory and huge performance problems. The regular list webpart works fine even for large datasets, but you will not be able to get total list counts regardless.
2. If you are planning on sorting or filtering by any columns be sure to add them to the lists indexed fields in the list settings page.
3. SharePoint is a database driven application and the list data is thus stored in database tables. The way it stores them is very denormalized as each columns data is stored in a single row, which means if you have 2 list items with 3 columns you get at least 6 rows of actual sql data. This adds up very fast and will quicky cause both disk and index fragmentation. Always ensure that your logical disks are defragmented once a week and use the SQL Management studio to create an index maintenence plan for all your databases.
4. For Developers using the Object model:
a. Getting counts: never reference the list item collections count property as that will trigger a complete retrieval of all the items in the list and possibly throw memory errors. If you need the list count use list.count instead of list.items.count
b. Item Access: never access the list item collection by index position as it will re-retrieve each item when accessed.
for (int i=0;i< list.items.count;i++) // bad
SPListItem li = list.items[i]; //bad
instead use the standard enumeration model to walk though your collection when needed
foreach (SPListItem li in list.items) //good
d. Paging through large numbers of list items: use the SPQuery and the RowLimit property to page through your larger list collections
SPQuery query = new SPQuery()
query.RowLimit = 10000;
do
{
SPListItemCollection myItems = list.GetItems(query);
for (SPListItem li in myItems)
{
//do something here
}
query.ListItemCollectionPosition = myItems.ListItemCollectionPosition;
} while (query.ListItemCollectionPosition != null);
e. Deleting large numbers of list items (new): use SPWeb.ProcessBatchData whenever possible. Taking above advice here is my idea of the best model for item deletion.
SPList lst = SPContext.Current.Web["MYLIST"];
SPQuery query = new SPQuery();
query.ViewFields = "<FieldRef Name='ID'/>";
query.IncludePermissions = false;
query.RowLimit = 20000;
// first get all the id's to be deleted
ArrayList al = new ArrayList();
do
{
SPListItemCollection myItems = lst.GetItems(query);
foreach (SPListItem item in myItems) // note use enumeration
{
if (item.ID == skipId) continue;
al.Add(item.ID.ToString());
}
query.ListItemCollectionPosition = myItems.ListItemCollectionPosition;
} while (query.ListItemCollectionPosition != null);
StringBuilder sbDelete = new StringBuilder();
sbDelete.Append("<?xml version=\"1.0\" encoding=\"UTF-8\"?><Batch>");
string listguid = lst.ID.ToString();
int bcount = 0;
for (int i = 0; i < al.Count; i++)
{
if (bcount > 1000)
{
sbDelete.Append("</Batch>");
lst.ParentWeb.ProcessBatchData(sbDelete.ToString());
sbDelete = new StringBuilder();
sbDelete.Append("<?xml version=\"1.0\" encoding=\"UTF-8\"?><Batch>");
bcount = 0;
}
bcount++;
sbDelete.Append("<Method>");
sbDelete.Append("<SetList Scope=\"Request\">" + listguid + "</SetList>");
sbDelete.Append("<SetVar Name=\"ID\">" + al[i].ToString() + "</SetVar>");
sbDelete.Append("<SetVar Name=\"Cmd\">Delete</SetVar>");
sbDelete.Append("</Method>");
}
sbDelete.Append("</Batch>");
lst.ParentWeb.ProcessBatchData(sbDelete.ToString());
I hope this helps. If you have any further advice please comment on this post and i will incorporate it for everyone. THANKS!
Joel's list of more performance related information
also see http://go.microsoft.com/fwlink/?LinkId=95450&clcid=0x409 for an earlier white paper on the subject
Del.icio.us |
Digg It |
Technorati |
Blinklist |
Furl |
reddit |
DotNetKicks
There are many factors involved in the SharePoint crawling process that can impact indexing performance. There are also some steps you can take to improve that. Here are the common causes and their resolution:
- Indexing Performace is set at reduced - common mistake on the configuration screen for the index service. See Central Administration > Operations > Services on Server > Office SharePoint Server Search Service Settings and set to Maximum.
- Number of Connections - by default the indexer will run a limited number of simultaneous threads (6 usually) per host. This can be increased manually by adding specific Crawler Impact Rules for each host. You can really improve speed by setting a large file server up to 64 connections. This number is just a suggestion btw to SharePoint, it also looks at other factors like the number of processors (8 * #procs). And also watch your network for bottlenecks and those pesky RPC errors you may get in your logs (dial it back of you see those)
- Crawled systems are slow or hosted on remote networks. - not a lot to be done here, except by moving those files closer.
- Overlapping Crawls - SharePoint gives priority to the first running crawl so that if you already are indexing one system it will hold up the indexing of a second and increase crawl times.
- Solution: Schedule your crawl times so there is no overlap. Full crawls will take the longest so run those exclusively.
- IFilter Issues - the Adobe PDF IFilter can only filter one file at a time and that will slow crawls down, and has a high reject rate for new PDFs
- Solution: Use a retail PDF filter from pdflib.com or Foxit
- Not enought Memory Allocated to Filter Process - an aspect of the crawling process is then the filtering deamons use up to much memory (mssdmn.exe) they get automatically terminated and restarted. There is of course a windup time when this happend and can slow down your crawling. The current default setting is pretty low (around 100M) so is easy to trip when filter large files. You can and should increase the memory allocation by adjusting the following registry keys
- HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Global\Gathering Manager: set DedicatedFilterProcessMemoryQuota = 200000000 Decimal
- HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Global\Gathering Manager: set FilterProcessMemoryQuota = 200000000 Decimal
- Bad File Retries - there is a setting in the registry that controls the number of times a file is retried on error. This will severly slow down incremental crawls as the default is 100. This retry count can be adjust by this key:
- HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Global\Gathering Manager: set DeleteOnErrorInterval = 4 Decimal
- General Architecture Issues - Ensure that you have at least 2 Gig of free memory available before your crawl even starts and that you have at least 2 real processors available.
- Disk Health - the nature of the indexing process causes extensive fragmentation of the file system for both the index server and the database server. Schedule defrags routinely and after all full crawls. Ensure you have enough diskspace always.
- Run 64 bit OS - school is still out on this one, i personally haven't seen much difference as long as there is enough memory and the same processor types, but MS recommends this for large deployments.
NOTE: It is a good idea to open up perfmon and look at the gatherer stats while indexing. There is a statistic called Performance Level and this reflects the actual level that the indexer is running at where 5 is max and 3 is reduced. Even if you set everything to max the indexer may decide to run at reduced anyways based an some unknown factors.
This is a good read too: http://technet.microsoft.com/en-us/library/cc262574.aspx (Estimate performance and capacity requirements for search environments)
Del.icio.us |
Digg It |
Technorati |
Blinklist |
Furl |
reddit |
DotNetKicks
SharePoint has built in functionality to index Lotus Notes content, but unfortunatly it has many shortcomings for the Enterprise. Before you exhust many cycles trying to get the built in connector working, you should educate yourself on all the features and functionality to determine if it meets your needs.
BA-insight has released a connector that meets all of the customer demands for searching notes. Some key points are that they fully honor security from group level security, to filters, to reader fields which the MS one does not. They also give granular control of the metadata mappings like title.
See a comparison full comparison matrix: http://www.ba-insight.com/collateral/notesfeaturematrix.pdf
Their UI product give an enhanced view of the Notes results also, by expanding out the list of attachments and showing a real time preview of each one.
They are giving on online webinar this Thursday (2/26/2009) at 12 noon EST which is open to the public: https://www2.gotomeeting.com/register/730057510
Request more information: Online Request Form
Del.icio.us |
Digg It |
Technorati |
Blinklist |
Furl |
reddit |
DotNetKicks
Read the press release here:
http://www.microsoft.com/presspass/press/2009/feb09/02-10NewEnterpriseSearchPR.mspx
Basically in Office 14 Fast will be included with the Enterprise CAL (with some additional costs for servers and most likely some consulting requirements).
More importantly they are not ripping out the current platform search bits, so there should be a straightforward upgrade path from MOSS 07 and no loss of investments in current technologies.
Del.icio.us |
Digg It |
Technorati |
Blinklist |
Furl |
reddit |
DotNetKicks
While many businesses are scaling back and shedding resources in response to this global economic crisis, BA-Insight proves that success can exist even in the worst of times. They continue to grow and expand, adding international offices and resources. See http://arnoldit.com/wordpress/2009/01/31/ba-insight-triples-in-2008/
It makes sense though, organizations, fueled by the need to make the most of their existing infrastructure and assets, are beginning to realize that they already own an enterprise class search system. Why struggle with the additional hardware, training, and costs of Autonomy, Fast, Coveo and others, and then still need to figure out how to integrate with your SharePoint portal?
A high end UI and connectivity to existing Line of Business systems are the only missing components to complete the SharePoint search solution and this is what BA-Insight provides.
Contact BA-Insight for more information about their offerings at http://ba-insight.net or directly submit a request for information or quote here: Online Request Form
Longitude Enterprise Connectors for MOSS and MS Search Server 2008
Security: All of the connectors fully map and honor security of the source system with Full AD mapping and synchronization.
Extensibility: Pull and combine metadata from multiple systems to extend your searchable record
Control: Wizards and a flexible administration site provide granular control of indexing, including ability to format and replace metadata on the fly.
Enterprise Connector Listing:
· Exchange Mailbox Stores – for desktop search replacement or discovery.
· Documentum – now you can retain your full security mappings.
· Lotus Notes – flexible and fully secure alternative.
· Hummingbird – highly customizable.
· WorkSite – documents and metadata combined with security.
· SAP – yes, you can now index SAP data securely.
· Enterprise Vault – unlock your archives to Enterprise Search.
· Interaction – legal CRM.
·
Custom SQL Database Systems – define your own connectors to SQL Server or Oracle with simple SQL wizards. Leverage full security mapping capabilities and include linked files, or even file blobs, in your database.
· Custom API based Systems – use the Web Services connector to wrap any API based system.
· Releasing Soon: Salesforce.com, MS Dynamics, MS CRM, FileNet, Stellent.
· 3-5 day turnaround for new connectors. So if you don’t see yours, just ask for it!
Longitude Search UI for MOSS and MS Search Server 2008
Once you see this search center replacement in action, you and your users will want it. Once you use it, you can't imagine searching any other way. Real time document previews save so much time searching, you know right away if it was the result you wanted. The facetted navigation, while similar to the codeplex variety is built into the product. Ask for demo and find out more, there are search tag clouds, automated relevancy improvements based on user actions and parametric searching.

Del.icio.us |
Digg It |
Technorati |
Blinklist |
Furl |
reddit |
DotNetKicks
Mark Gerow has written a great article on the practical aspects of implementing SharePoint search. Great Reading.
http://www.law.com/jsp/legaltechnology/pubArticleLT.jsp?id=1202426774552#
Here are some of his topics:
WHY SEARCH IS DIFFERENT?
Why can't it just work like my favorite Internet search engine?
How do I know others won't be able to find my secured documents?
Does it search my Inbox? ( yes it can btw with BA-insights.com http://www.ba-insight.com/exchange.html connector).
Do I have to stop searching the old way? I liked it.
OF PROTOCOL HANDLERS, IFILTERS AND THE BDC
CONCEPTUAL VS. KEYWORD SEARCH
FACETED SEARCH
Del.icio.us |
Digg It |
Technorati |
Blinklist |
Furl |
reddit |
DotNetKicks
I have just discovered a new PDF ifilter from a German company that supports both 32bit and 64bit.
http://www.pdflib.com/download/tet-pdf-ifilter/
I ran it through testing on both platforms and could not break it and was impressed with the speed.
As many of you know the Adobe ifilters are not 64bit compatible, there is a "work around" but I don't recommend it. That leaves you with either Foxit or this new company for 64bit support.
For 32bit you can use the Adobe ones for free, but you should know that they only work on one PDF at a time and thus are not as fast as the non-free versions from the other vendors.
UPDATE: Adobe releases true 64bit ifilter: http://www.adobe.com/support/downloads/detail.jsp?ftpID=4025
Del.icio.us |
Digg It |
Technorati |
Blinklist |
Furl |
reddit |
DotNetKicks
After updating one of my QA servers with the latest MOSS security patches from windows update I am seeing some strange results.
The scope counts that are displayed are actually login dependent. If the account that you are logged into the SSP with doesn't have permission to the files or items that were crawled they aren't included in the scope count. This includes the counts displayed in the scope rules also.
I am trying to determine which update introduced this issue. Can anyone shed some light on this or at least confirm I am not imagining this new feature.
UPDATE: it will also apply your real time security trimmers too.
Del.icio.us |
Digg It |
Technorati |
Blinklist |
Furl |
reddit |
DotNetKicks
So have been getting client requests for us (sharepointworks) to develop a 64 bit lotus notes connector on our framework. It would be developed in partnership with an IBM Lotus technology partner and thus would be very robust and scalable. More important it would benefit from our advanced security mapping framework to ensure users / groups and real time security is honored.
We are now trying to decide if there is enough of a market to do this. I have not had the opportunity to evaluate the current packaged MS offering and we don't really want to compete with free. But if demand keeps increasing we will add it to our Enterprise offerings. (Hummingbird, PCDocs, Worksite, Exchange Mailboxes, SQLServer, OLEDB, Dynamics(in development), Semantec Enterprise Vault (in planning)).
If any of my readers has any insight as to the market for this or if you have an interest in it (or any feature requests), please let me know through this blog. As with all connectors developed on our framework, it will be code complete in a matter of weeks once we start development.
Del.icio.us |
Digg It |
Technorati |
Blinklist |
Furl |
reddit |
DotNetKicks
Yes, I am one of them.
More info: http://sharepointsearch.com/cs/blogs/enterprisesearch/archive/2008/01/08/microsoft-announces-offer-to-acquire-fast-search-amp-transfer.aspx
I guess I was right. http://sharepointsearch.com/cs/blogs/notorioustech/archive/2007/07/19/fast-search-integration-announcement-could-be-interesting.aspx
Until the full nature of the future integration of SharePoint and Fast are announced we will all be just guessing.
One guess (from a vendor) is that it will become the basis for Office 14, but who knows.
One thing is sure, Microsoft was missing conceptual/semantic (http://en.wikipedia.org/wiki/Latent_semantic_analysis) search capabilities and now they have it.
Stay tuned.
Del.icio.us |
Digg It |
Technorati |
Blinklist |
Furl |
reddit |
DotNetKicks
Don't know how i missed this one before.
After you install the latest Search Server 2008 RC 64bit (didn't check the 32 bit), go to Central Administration->Operations->Logging
By default Collect Error Reports is enabled and so is the checkbox to silently send error reports to Microsoft.
NOTE: it doesn't just say SharePoint error reports and could possibly mean all error reports on the computer.
Here is the text:
Change this computer's error collection policy to silently send all reports. This changes the computer's error reporting behavior to automatically send reports to Microsoft without prompting users when they log on.
I personally have no problem helping out the development and QA process of Microsoft. I am just not sure all my clients want this on by default.
Del.icio.us |
Digg It |
Technorati |
Blinklist |
Furl |
reddit |
DotNetKicks
More Posts
Next page »