February 2009 - Posts

Digging deep to understand MOSS's behavior
23 February 09 06:53 PM | Blogging about SharePoint: Posts
Body:

I blogged about this problem earlier in my post titled Corrupted Workflow Association?.  At the time, I decided to work around it by renaming the workflow association and then NOT EVER TYRING TO DELETE IT AGAIN.  That was fine in my development environment, however then this happened in our QA environment.  The symptom was similar in the QA environment: I try to delete the workflow, and the page never returns, meanwhile the CPU on the SQL server gets very busy - in this case, sticking at 50-80%  for a long time.  Long enough for us to find a database administrator to kill the blocking process in SQL Server.  Soon as the process dies, the CPU goes back to nearly idle.

So now I had to figure out what it was doing so we could ensure it would NEVER HAPPEN AGAIN.

So some background first. 

We have several environments, and we do code migrations from one to another with a fair amount of structured process.  I develop workflow solutions on my virtual machine (VM), a MOSS Windows 2003 server with a local SQL Server and Visual Studio and a number of other tools.  The VM is a domain member. I package my code into a Web Solution Package (WSP) and upload the WSP to the "Deployment Control List" - a SharePoint List that the Farm Admin created to help structure the deployment process.  The farm admin downloads the WSP, other related files, and the link to the install doc from that list.  He then deploys it to the INTEGRATION environment.  Successfully INTEGRATION deployment means we can deploy to QA later in the week.  Likewise, successful deployment to QA means we can go to PRODUCTION.  At this client, we only go to QA once a week and to PRODUCTION once a week.  Moves to production can happen either the following day or the following week after a successful deployment to QA, depending on how much time in QA is desired.

To summarize, the code migration path is DEV > INT > QA > PROD. 

All four of our environments are MOSS Enterprise Edition.  The size of each farm varies from an all-in-one (though NOT using SQL Express!) environment for my Dev VM to QA and PROD that have two Web Front Ends, several application servers (including a SQL Server Reporting Services server), and a clustered SQL Server back end.  QA is virtualized, while I believe PROD is physical servers.

This is release 2 of this particular workflow solution, and yesterday was about the fourth time it was deployed to QA, and about the 9th time it had been deployed to INT.  In other words, we had been finding and fixing bugs for the last four weeks, as well as adding features.  The business analyst I work with is understandably more than ready to get this version into production and done with.

In hindsight, it's important to note that an intermediate build of V2 of this solution - call it v2.2, with the current build being 2.9 - had a bug in it where it could end up in a loop that created many, many tasks in the task list (THOUSANDS apparently).  This was fixed in build 2.7.  Build 2.2 made it as far as QA, but did not go to PRODUCTION.  The trigger that would send the v2.2 workflow into the task creation loop was reassigning a task.  Our testing process leans toward the informal, and because it wasn't the focus of our change in v2.2, I think we had only tested the reassign functionality in QA on a whim.  When it broke, I went back into DEV, and recreated the problem.  The net effect was that both Dev and QA had thousands of tasks, whereas Integration did not.  V1 of this workflow has been running in Production for over a month and has only 300-400 tasks in Production.

Why is the number of tasks important? Well, I'm getting ahead of my self, but you will see.  Let's go back to the front end and the primary symptom.

When you delete a particular workflow association in my Dev environment, the CPU hits the roof, stays there, and interactive performance takes a nosedive.

First of all, what is "removing a workflow association"?

Remove Workflows Screenshot

It's selecting Remove, then clicking OK, on the Remove Workflows page (get there from List Settings, then Workflow Settings).

So what happens when you click OK? Well, on the MOSS Web Front End side, you could crack open .NET Reflector and take a look. However, the symptom her is that the SQL Server process goes nuts, NOT W3P.  Therefore, let's not look at the .NET code, but rather we crack open SQL Profiler and have it trace the SQL activity between the Web Front End and the database.  We start the trace right before we click the OK button.  We stop the trace once we see that the process has definitely gone nuts.  Confirmation that SQL has definitely gone nuts is as simple as running Task Manager and watching the CPU Usage graph go solid bright green and the CPU History Graph(s) jump up and stay high.  Soon as you see that, it's time to restart SQL Server or kill the offending SQL Server Process.  During my investigations, I had the Services MMC open so that I could quickly restart SQLServer in one click.  In my Dev environment, if you let it run to long, the machine becomes unusable (e.g. you click and it may respond a minute later if it can spare the CPU cycles to deal with users and UI).

We also had SQL Server Management Studio up, and had the Activity Monitor going.  This time we got lucky and captured a particularly relevant SQL Statement as the statement that was causing SQL anguish:

image

Now we cross referenced this back to our SQL Trace in SQL Profiler, and did a search for DropWorkflowAssociation, and sure enough we found the following statement:

exec proc_DropWorkflowAssociation '7082EA6F-E14E-4B3E-9581-C10AB17F0A12','9FE5442A-C3DA-443E-B755-48A50D9BD9A4'

Oh, great...GUIDs.  <Sarcasm>They make everything so much more readable!</Sarcasm>.  However, a web search of proc_DropWorkflowAssociation reveals that the stored procedures in MOSS are actually documented in a PDF on MSDN!  http://msdn.microsoft.com/en-us/library/cc313104.aspx

In fact, if you look at section 3.1.4.29 on page 70, it provides the signature for the stored procedure call, and explains the parameters.

PROCEDURE dbo.proc_DropWorkflowAssociation ( @SiteId uniqueidentifier, @Id uniqueidentifier, @DropAll int = 0 );

So the first parameter is the site collection GUID, the second parameter is the workflow association GUID, and the third parameter is not specified so it defaults to 0.

So, is '7082EA6F-E14E-4B3E-9581-C10AB17F0A12' the GUID of the site collection that contains the site that has the list with the bad workflow association?  It sure is, and a line of PowerShell proves it:

Get-SPSite http://server/managedpath/pathToSiteCollectionRootSite | foreach {$_.Id.ToString(); $_.Dispose()}

That returns the GUID that is used in parameter 1.  See www.codeplex.com/PSBB and my other posts on PowerShell for more information about using Get-SPSite.

The second parameter is indeed the GUID for the workflow association.  We can verify that later on.

So now that we know we have found the relevant statement, why is it driving the processor nuts?  What does DropWrokflowAssociation actually do? Well, I don't want to post the entire stored proc here (might be a violation of MSFT's Intellectual Property), but if you have access to the SQL Server for a MOSS installation, you can take a look at it yourself - just look at the stored procedures for the relevant Content Database.  Nonetheless, I think I can discuss the procedure at the pseudocode level without getting in trouble.  What it does

  1. Get the GUID of the SPWeb and the GUID of the SPList from the WorkflowAssociation Table where they are for the SPSite and WorkflowAssociation that we passed in.
  2. If we find it, and since DropAll is zero, we call the following procedure:

    proc_AutoDropWorkflows @SiteId, NULL, @ListId, NULL, @Id, NULL, 1 , 200

  3. If #2 deleted all workflows, then delete the workflow association from the workflow association table.

So let's take a look at step #2 because it calls another stored procedure.  What does AutoDropWorkflows do?  Back to the PDF documentation, this time section 3.1.4.9 on page 50.

PROCEDURE dbo.proc_AutoDropWorkflows ( @SiteId uniqueidentifier, @WebId uniqueidentifier, @ListId uniqueidentifier, @ListItemId int, @TemplateId uniqueidentifier, @AutoCleanupDate datetime, @ForceDelete int, @TopBeforeQuick int = 2147483647 );

Ok, so we are passing a null for the Web Id, a Null for the List Item Id, a null for the Auto Cleanup Date, a 1 for ForceDelete, and Top Before Quick is 200.  Reading through the PDF document [apparently referred to as MS-WSSPROG],  if WebID is null, the action is for all sites (SPWebs) in the Site Collection.  Ok, in our scenario, that seems unnecessary since we are deleting a workflow association in a specific site (SPWeb).  If you run this procedure and specify the GUID of the SPWeb that has the list with the workflow association, does that make this work?   Well, there's a problem with trying to answer that question - you are modifying the Content Database directly via T-SQL, NOT through through the SharePoint API.  In other words, DON'T DO THIS.  If you execute any T-SQL commands other than select statements, Microsoft Support will no longer help you should you call them.  What? You say you want to do it anyway? It's a development database after all.  OK, then you MUST do a Full Backup of the Content Database before you run any T-SQL command that will edit anything.  Now that you have done a Full Backup, are you ready to play with the T-SQL command? NO. NO. NO.  You don't know you have a valid backup until you successfully restore from it.  So edit a list item somewhere, then restore the backup and make sure the site comes up with the list item in its previous state.

Now that you have backed up the content database, AND PROVEN VIA A RESTORE THAT THE BACKUP WORKS, you can now muck around with T-SQL in the Content Database because you can always get back to a supported state.  So let's try this: If we run AutoDropWorkflows @SiteId, @WebId instead of NULL, @ListId, NULL for the List Item ID, @Id for the workflow association, NULL for the AutoCleanupDate, 1 for ForceDelete , 200 for TopBeforeQuick, does it work better? No, the processor still goes nuts.  So let's take a look at the rest of what it does.

ListItemID - is being passed as NULL, so all list items are being worked with.  That seems ok, though what list items are these?  The items in the list with the workflow association. 

TemplateID is the workflow association. 

AutoCleanupDate is irrelevant since ForceDelete is 1 - again, per the MS-WSSPROG doc.

ForceDelete is 1, and when this is one, it deletes the first 200 workflows. The 200 is specified in the TopBeforeQuick parameter.

Note that the return value of this procedure is 1 only if it deleted all the workflows without hitting the TopBeforeQuick limit.  Only if the return value is 1 does the calling procedure, Drop All Workflows, actually do step 3 above (deleting the workflow association from the association table) However, since when I ran this, it never returned, I don't know if it will return  1 or 0.

Now thinking about my setup, I only have one or two workflow associations set up on this list, so if I am interpreting that right, since 2 < 200, I will never hit that limit.  Let's take a look into the body of Auto Drop Workflows to see why it would peg the CPU and apparently never return.

As I look at the body of Auto Drop Workflows, the fourth line after the AS statement defines a cursor.  Ah-ha, a looping construct!  A loop gone bad is consistent with the CPU going nuts. Cursors are what you use in T-SQL when you have to walk through a list item by item instead of being able to deal with a set of records as a whole.    What does the cursor do?  Well, for each workflow instance, it calls

proc_DropWorkflow @WorkflowInstanceId, @SiteId, @WebId, @ListId

Drop Workflow does the following:

  1. Looks up the TaskListID (among other things) in the first select statement
  2. Conditionally decrements an instance counter in the WorkflowAssociation table
  3. Conditionally calls proc_RemoveFailover
  4. Deletes from the Workflow table where the site and Workflow instance ID match.
  5. Calls proc_DropWorkflowTasks @WorkflowInstanceId, @SiteId, @WebID, @TaskListId
  6. Conditionally calls proc_DeleteContextCollectionEventReceivers and deletes from ScheduledWorkItems

Step 5 is the important one.  Looking at it, it declares another cursor based on data in the AllUserData table, and walks through that data calling proc_DropListRecord on each row.

Let's take a closer look at that.  Note that the documentation for this stored proc is in MS-WSSDLIM.pdf, not MS-WSSPROG.pdf.  You should download the whole package of PDFs and set up Desktop Search or MOSS Search to index them.  This way it is easy to find which document contains a given stored procedure.

EXEC proc_DropListRecord @SiteID, @WebID, @TaskListId, 0, @ItemId, 1, 0, NULL, NULL, 0, 0, NULL, 3

Look at the definition of this stored procedure.  This isn't  just a simple delete from a table either.  This stored proc has to take into account use of the recycle bin, the need to audit, and perhaps other list settings.  Also, there is no one to one correspondence between that Tasks list and a "Tasks" table in the database. 

More specifically, Drop List Record does the following:

  1. Looks up a bunch of values
  2. If it finds them, then it calls proc_VerifyUpdateConditions
  3. It calls proc_DeleteContextCollectionEventRecievers
  4. If Auditing is needed, it calls proc_AddAuditEntryFromSql
  5. it calls proc_DeleteUrlCore
  6. If it's a meeting list it ... well, never mind, it's not a meeting list
  7. It calls proc_UpdateDiskUsed

That's a lot of stuff to do for deleting a list item!  I suspect that some of those may call other stored procedures, but I have to end this somewhere, so I think this is deep enough for now.  We are now 4 layers deep, inside two cursors.  No wonder this takes a while and makes a SQL Server busy.

The net effect of the call we started with, Drop Workflow Association, is that it is trying to individually drop each task associated with each list item that used the workflow used by the workflow association in question.  When you have several thousand tasks, and you have to do stuff to each one of them, it's not quick to get rid of them.  From this I can conclude that the process to drop a workflow, wasn't really stuck spinning doing nothing.  It was working its way through a very long list, but in my two environments, doing it so slowly that it would have taken hours (at least in my Dev VM) to complete.  No wonder it looked like the SQL server was going nuts.

Overall, my resulting request to the SharePoint Product Team is that if there is more than some number of tasks in the task list, then deletion of a workflow association should be treated as a long running operation, at least from a UI perspective. It should present a UI similar to the long running operation screen you get when you click on the "hurricane button" to update a Business Data column in a list.  Unless of course they have a better way.

--Michael

Published: 2/23/2009 3:53 PM
Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks
A Sign the Farm is Quiescent
21 February 09 01:41 AM | Blogging about SharePoint: Posts
Body:

In Central Admin, in Operations, you have an option to quiesce the farm.  This stops all new connections, and is a good idea before, for example, restoring a content database.  it's the next to last operation under Global Configuration.

image

When the farm is quiescent, you will get certain errors, such as the following.

image

"The server is currently being taken down for maintenance" is InfoPath Forms Services' way of saying the farm is being quiesced, and we can't do any new connections - although the fact that the web pages leading up to this screen didn't object seems  a little inconsistent.

To fix this, go to the Quiesce Farm option in Central admin and click on Reset Farm.

image

 

--Michael

Category: MOSS 2007
Published: 2/20/2009 10:41 PM
Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks
Filed under:
Very Odd Calculated Value Column Behavior
18 February 09 06:33 PM | Blogging about SharePoint: Posts
Body:

I have a task list, and I have added a calculated value column to it.  The calculated value column returns a text string which is recognized as a URL and is a clickable link.

The formula for the column is =concatenate("http://server/managedpath/sitecoll/site/_layouts/UpdateSomethingStatus.aspx?TID=",ID).  

When you click on the link, the target page checks a web service to see if one of the columns in the task list (the "Is A" column) needs to be updated, if so updates that column,  and then redirects back to the task list.  The code does not touch the calculated value column at all.

When the column gets updated, the calculated value field drops the value of the ID field from the displayed link.  WHY?

Here is a picture.  The calculated value column is named Recheck.  See how the there is no number to the right of "TID=" in the first item, but there is in the second item?  Of course, all items have an ID, so why did this one disappear?

image

 

Ideas?

Michael

Published: 2/18/2009 3:33 PM
Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks
Corrupted Workflow Association?
16 February 09 06:20 PM | Blogging about SharePoint: Posts
Body:

I prefer to blog about solutions rather than problems, but lately I've been running into more of the later...

Here's one.  I have a list with a workflow association, and when I try to remove the workflow association in my development environment, SQL Server goes nuts and eats up all the CPU.  Interactive performance degrades and the only thing I can do is restart the SQL Server service.  My workaround at this point is just to rename the workflow association to include the word "Bad" in the name, so I know not to try to delete it again.

I suspect the only way to fix this is to go into the content database and do some table editing directly, but that is a strictly forbidden action unless under the supervision of Microsoft Support, and it's not worth a call to them at this point.

Though if anyone has seen this happen before or has any idea why it happens and how a workflow association can "go bad", I'd like to hear about it.

 

--Michael

Category: SharePoint Development
Published: 2/16/2009 3:20 PM
Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks
BDC Business Data Column - Can you limit the rows updated?
16 February 09 05:52 PM | Blogging about SharePoint: Posts
Body:

I've just posted a question on the MSDN forums as follows:

I have a Task list that has two content types that inherit from Workflow Task.  Let's call them CT1 and CT2.  CT2 has a Business Data Column while the others do not.  This Task list can contain several thousand records, though the number of active tasks at any given time is small (e.g. 20).  The Business Data Column gets its data from a web service.  I have a view that displays only the active tasks, and includes the business data column in the columns displayed.  When I click on the update link (the circular green arrows at the top of the business data column), the subsequent pages warns that it will update ALL items in the list, not just the ones in the current view.  Is it possible to restrict the update to only the items in the view?

I posted that at http://social.msdn.microsoft.com/Forums/en-US/sharepointbdc/thread/fae084a3-0fdf-4951-a8f9-d0229a4b0a83 .

In my scenario, updating only a few records is perfectly reasonable, and the time that it would take to update all records would be enough to make the users unhappy.  In my scenario, a user might want to run the update several times a day, and really only cares about making sure they have the latest data for a very small number of rows.  I'd like to be able to offer the users a way to update only a single record at a time, or only the records in their view. 

If you have any suggestions, please post a comment here, on Twitter, or in the MSDN forum.

Michael

Category: SharePoint Development
Published: 2/16/2009 2:52 PM
Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks
Twitter ROI == Minutes! AKA Better CAML Intellisense
16 February 09 12:32 AM | Blogging about SharePoint: Posts
Body:

So I've just today gotten on Twitter.  First thing I did was to start following a bunch of SharePointy tweeters, and some Magenicons (Magenic Consultants).  Andrew Connell's most recent tweet is a link to one from John Holliday. He has a package to make your CAML intellisense RADICALLY better.

Check it out here:http://johnholliday.net/post/2009/01/18/Feed-your-CAML-some-IntelliSense-so-it-doesnt-bite-you-in-the-you-know-what!.aspx

--Michael

Category: SharePoint Development
Published: 2/15/2009 9:32 PM
Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks
Debug XOML Bogus - why?
16 February 09 12:15 AM | Blogging about SharePoint: Posts
Body:

On Friday, I was debugging a SharePoint workflow that I built in Visual Studio 2008.  When I ran it, then stepped through the running workflow in the debugger, the designer view of the running XOML did NOT match the designer view of the workflow.  In fact, the "live" XOML that was generated by the debugger was a visualization of an old build of my codebase.  WTF?  In particular, I had a parallel activity in my workflow, with 5 branches.  In my current build, I had commented out branches 2-5, and renamed all the activities in branch 1.  When my code ran, I could tell by the messages I was writing to the history list that the current version of my workflow was indeed running.  But when I stepped into it, the "live" XOML designer view that the debugger created had all 5 branches of parallel activity enabled, and branch 1 had all the old names.  How is that even possible?

Michael

Category: SharePoint Development
Published: 2/15/2009 9:15 PM
Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks
Upcoming Event: SharePoint Intranet Success Strategies Webinar
06 February 09 12:05 AM | Blogging about SharePoint: Posts
Body:

Martin Muldoon from BA-Insight, my colleague Anthony Handley (MVP, Expression: Design), and I are presenting the following webinar on February 17th

clip_image001

Register Now!

Category: SharePoint Events
Published: 2/5/2009 9:05 PM
Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks
Filed under:
Upcoming Event: Chicago SharePoint User Group February Meeting
05 February 09 11:59 PM | Blogging about SharePoint: Posts
Body:

Automate Business Processes Using InfoPath Forms with Integrated SharePoint Designer Workflows…
WITHOUT CODING!
Asif Rehmani
SharePoint Server MVP
Trainer/Solution Architect
SharePoint Solutions

Forms and Workflows are essential to business processes. Organizations usually rely on programmers to create the forms and workflows using code.
Well, code no longer! If you have access to Microsoft Office InfoPath and Microsoft Office SharePoint Designer, you can create powerful data driven form solutions on your SharePoint sites.

InfoPath gives you the ability to pull data from databases and lists, and create forms with data validation and conditional formatting. SharePoint Designer's workflows let you then design powerful multi-step workflows centered around the form collected data.
In this presentation, you will see how to design a robust form using InfoPath and then design a workflow using SharePoint Designer to route this form appropriately...all WITHOUT CODE!

Asif Rhemani is a trainer at the Chicago office of SharePoint Solutions where he provides training and strategic consulting based on SharePoint products and technologies. He is a SharePoint Server MVP, a Microsoft Certified Trainer and a founding sponsor of the Chicago SharePoint User Group. He is a frequent speaker at SharePoint events including Microsoft’s SharePoint Conference, SharePoint Connections, and Information Workers Conference.

Site Content and Usage Reports in SharePoint Designer
Woody Windischman
Former SharePoint Server MVP
Consultant

SharePoint Designer is more than just an editing tool. It provides a number of features for managing both SharePoint and non-SharePoint sites. In this segment, learn about the powerful and flexible reporting options SharePoint Designer provides.

Woody Windischman is a technology consultant with over 20 years of experience. He is deeply involved in the SharePoint community, including the TechNet forums, Twitter, and SharePoint University. He was a two year SharePoint MVP and worked directly with Microsoft’s SharePoint product team for one year. Woody recently authored a book, Professional Microsoft Office SharePoint Designer 2007, published by Wrox, which has just been released.

Please register early.
Registration will be closed when maximum capacity is reached.

Register Now at ClickToAttend.com!

To find out about future events, sign up for the CSPUG mailing list at www.CSPUG.org .

Category: SharePoint Events
Published: 2/5/2009 8:59 PM
Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks
Filed under: