Recently, Iranian crackers used a username and password to make certificate requests from the Comodo Certificate Authority. These requests were successful and certificates were issued for 9 domains which are published on the Comodo Fraud Incident Report page: http://www.comodo.com/Comodo-Fraud-Incident-2011-03-23.html 

This issue is of particular importance to me because SSL is the primary mechanism by which integrity and confidentiality are assured for security Security Tokens and Security Token Requests. My latest blog post provides instructions on how to add Yahoo and Google as Identity Providers to Windows Azure AppFabric Access Control Service v2.0. The fraudulent certificates are for the major Identity Provider sources on the Internet (e.g. mail.google.com, www.google.com, login.yahoo.com, login.skype.com, addons.mozilla.org, login.live.com, global trustee). These certificates may be used to spoof content, perform phishing attacks, or perform man-in-the-middle attacks against all internet application users (in my view, it potentially impacts more than just applications accessible via web browsers). Although the sky is far from falling, this breach does illuminate some pretty significant vulnerabilities in our Internet security infrastructure, which need to be tightened.

Revocations of your computer’s trust of these certificates can be obtained via a web browser update (which is also very unfortunate as it makes the procedure for responding to such security threats extremely cumbersome and hard to orchestrate). In short though, you (and/or your application users) must update your web browsers to gain protection. Here are a few links for popular web browsers:

Microsoft IE Browser: http://support.microsoft.com/kb/2524375
Firefox Browser: http://www.mozilla.com/en-US/firefox/3.6.16/releasenotes/
Google Chrome: Tools/About (update will install automatically if you are online)
Apple Safari: http://www.apple.com/safari/
Opera: http://www.opera.com/download/

Each web browser is different, but to verify that you are protected, navigate to the certificate store of your browser and find the “Untrusted Publishers” tab (or equivalent). You want to see the list of domains above in the “Issued To” column of untrusted publishers. The following is from Internet Explorer:

image

Please notice that there are only EIGHT certificates in the revocation list. I am puzzled as to why the “www.google.com” certificate is missing; however more information was not readily available at the time I wrote this blog post.

This blog post assumes that the reader knows the basics of Identity Providers and Security Token Services. Its purpose is to illustrate how to programmatically add Google or Yahoo as an Identity Provider because there isn’t much information available on how to do this. For further information about using the ManagementServices proxy, I suggest downloading the Codeplex ACS Management examples from http://acs.codeplex.com/releases/view/57595

We manage the Windows Azure AppFabric Access Control Service v2.0 through code using the ManagementService proxy and data types which are generated when we add a service reference to the ACS Metadata endpoint located at https://{yournamespace}.accesscontrol.appfabriclabs.com/v2/mgmt/service, You can do this using either the Visual Studio “Add Service Reference” menu option, or manually using the svcutil.exe utility. There are examples of this in the code samples mentioned above.

To begin, we will use the management service proxy to retrieve a list of the IdentityProviders that have already been installed for the targeted namespace. By default, Windows Live ID will already be present and cannot be removed. The management service API requires that all requests be accompanied by a SWT token, which is also covered in the previously mentioned code samples.

To create a new IdentityProvider, we need to establish an Issuer for tokens coming from that Identity. To do this, we create a new instance of the “Issuer” type and initialize its Name property to “Google”. This “friendly name” will appear in the ACS Management portal UI. We can then add that type to the management Issuer’s collection and save our changes. This will generate a new Id for the Issuer. We can then create an instance of IdentityProvider. Set the DisplayName and Description to appropriate values for display in the ACS Management Portal. Set the WebSSOProtocolType to “OpenId” and the IssuerId to the Id property of the Issuer that we just created and saved.

   

  

       // ms is an instance of ManagementService proxy 

      Issuer issuer = new Issuer { Name = “Google” };

      ms.AddToIssuers(issuer);

      ms.SaveChanges(SaveChangesOptions.Batch);

 

      // Create Identity Provider

      IdentityProvider identityProvider = new IdentityProvider {

            DisplayName = “Google” ,

            Description = “Google” ,

            WebSSOProtocolType = “OpenId”,

            IssuerId = issuer.Id

      };

      ms.AddObject("IdentityProviders", identityProvider);

 

 

We need a means for the token requestor and consuming applications to verify the authenticity of tokens issued by the STS. The STS publishes the base64 encoded public key of the certificate that it will use to digitally sign its tokens in the metadata exchange document. We will set the appropriate IdentityProviderKey properties to the certificate values and then we’ll add the IdentityProviderKey object to our object graph and associate it with the IdentityProvider that will use it as shown in the following code:

 


       // *** Create the Identity Provider key used to validate

       // the signature of IDP-signed tokens. Signing certificates

       // can be found in a WSFederation IDP's metadata.

       IdentityProviderKey identityProviderKey = new IdentityProviderKey {

              DisplayName = "GoogleIdentityProviderKeyDisplayName",

              Type = “X509Certificate”

              Usage = “Signing”,

              Value = Convert.FromBase64String("MIIB9DCCAWGgAwI…”),

              IdentityProvider = identityProvider,

              StartDate = DateTime.UtcNow,

              EndDate = DateTime.UtcNow.AddYears(1);,

       };

       ms.AddRelatedObject(identityProvider, "IdentityProviderKeys", identityProviderKey);

 

 

Our new Google or Yahoo IdentityProvider will need to have an endpoint address associated with it. We can do this by creating an instance of the IdentityProviderAddress class and adding it to the entity data model then saving our changes. There are two properties on this class with values that are less than obvious (or even discoverable).  The Address property of the endpoint address instance must be set to https://www.google.com/accounts/o8/ud and the EndpointType must be to “SignIn”.  For Yahoo, set the Address property to https://open.login.yahooapis.com/openid/op/auth and the EndpointType to “SignIn”.

 

       IdentityProviderAddress googleRealm = new IdentityProviderAddress() {

              Address = "https://www.google.com/accounts/o8/ud",

              EndpointType = “SignIn”,

              IdentityProvider = identityProvider,

       };

       ms.AddRelatedObject(identityProvider, "IdentityProviderAddresses", googleRealm);

       ms.SaveChanges(SaveChangesOptions.Batch);

 

 

We now need to associate our new Google IdentityProvider with the relying party applications that will depend upon it. In our case, this is every RelyingParty defined (other than the AcessControlManagement) so we simply loop through them as the following code demonstrates:

 

      

// Make this IDP available to relaying parties

// (except for the Management RP)        

       foreach (RelyingParty rp in ms.RelyingParties) {

              // Skip the built-in management RP

              if (rp.Name != "AccessControlManagement") {

                     ms.AddToRelyingPartyIdentityProviders(new RelyingPartyIdentityProvider {

                           IdentityProviderId = identityProvider.Id,

                           RelyingPartyId = rp.Id

                     });

              }

       }

       ms.SaveChanges(SaveChangesOptions.Batch);

 

 

This should be enough to supplement your knowledge of using the Windows Azure AppFabric Labs v2.0 Access Control Service Management API to programmatically setup Google (or Yahoo) as an Identity Provider for your relying party applications.

Windows Azure provides us the ability to scale our application up by specifying how many CPU cores we want in our service instances, or to scale out by specifying how many single-core instances we require. Both strategies can be used to accomplish our scaling objectives for the same price (8 1-Core machines @ 12 cents/hour or 1 8-core machine @ 96 cents/hour), but in smaller deployment scenarios (under 8 CPU cores) there are a couple of advantages that clearly favor selecting a greater number of small-VM instances over a single VM instance with an equivalent number of cores.

The Windows Azure Service Level Agreement (SLA) guarantees 99.95% service uptime. To receive this benefit, the SLA requires that you deploy a minimum of two service instances. Another important feature is the Rolling Upgrade. A rolling upgrade is a deployment feature of Windows Azure that allows service instances to be stopped and upgraded individually without bringing all of your instances down at the same time. This allows your service to remain operational during upgrade periods (albeit in a degraded state).

Be aware that installing the November 2010 Windows Azure SDK v1.3 will break support for cloud projects running under Visual Studio 2008. To the best of my knowledge this was not widely announced (in fact, I learned about this fact during installation of the SDK). If you have Visual Studio 2008 Windows Azure projects, you’ll want to ensure that you have Visual Studio 2010 and a plan for migrating your projects prior to installing this new SDK.

When running the SDK setup on a machine with Visual Studio 2008 installed, you’ll receive a warning that “Setup has detected that Windows Azure Tools for Visual Studio 2008 is installed. As Windows Azure Tools 1.3 does not support Visual Studio 2008, if you continue to install this software, Windows Azure Tools for Visual Studio 2008 will stop working due to incompatible Windows Azure SDK version. Do you want to continue?

The Windows Azure Worker Role is a perfect place to put code that you want to run continuously in the background to process work as it becomes available. The information presented here would also be useful in web roles as well.

If you’re writing cloud applications, its likely you are targeting high levels of performance and scalability. It is reasonable to expect that you want to get the most out of your investment in cloud computing, and making the best use of your purchased resources will save you money. It is therefore also reasonable to expect that most non-trivial applications that you deploy to a production cloud environment would be written to perform I/O operations asynchronously.

In a Windows Azure Worker Role, a single thread is dispatched to your worker process’ Run method by the Windows Azure AppFabric. The rest of the threading model is left up to you. This is very much like a windows service or a console application. If you want to make maximum use of the cores available in your service instances then it is highly recommended that you leverage the CLR thread pool.

Using the .NET Task Parallel Library (TPL) is an option if your worker roles are compute-bound, but it won’t help you much for I/O bound operations. Because today’s CPU’s are so powerful and the data being operated on by your worker role will generally have to be retrieved from some remote location such as Windows Azure Data Storage or SQL Azure, it is much more likely that your worker roles will be I/O-bound than compute-bound. If you use the TPL to increase concurrency it will do so by increasing the number of threads. Threads are resource heavy and there is a limit to the number that you can create before performance is degraded instead of being improved.

The entry point for a Windows Azure Worker Role is the Run() method. Although we would really love it if the architecture of the Worker Role allowed us to not block this thread until we want to terminate the worker role, the Windows Azure Worker Roles do not allow this… so we reluctantly put ourselves into an infinite “while(true)” loop with a 3 second sleep interval, and when there is work to perform in the job queues this thread dispatches the messages to the thread pool during its next wake cycle. The thread pool typically creates one thread per CPU, and these threads will process the messages concurrently and efficiently (without context switching).

As depicted in the following code example, we get the maximum number of messages that we can from the Windows Azure Data Storage queue, and then we create an AsyncEnumerator instance to asynchronously process each message. We call the BeginExecute method of the AsyncEnumerator passing in the message processing routine “ProcessMsg”     

     public override void Run() {

         while (true) {

            Boolean anyMsgs = false;

            // I call GetMessages synchronously because the Run thread can't do anything else

            foreach (var msg in s_msgQueue.GetMessages(CloudQueueMessage.MaxNumberOfMessagesToPeek, TimeSpan.FromSeconds(30))) {

               anyMsgs = true;

               var ae = new AsyncEnumerator();

               ae.BeginExecute(ProcessMsg(ae, msg), ae.EndExecute, null);

            }

            // there may still be messages in the queue so don’t sleep; try to get them

            if (anyMsgs == false) {

               // I call Thread.Sleep synchronously because the Run thread can't do anything else

               Thread.Sleep(3000);

            }

         }

      }

 

 

The ProcessMsg routine (as shown below) handles the grunge work of processing each of the incoming messages in an asynchronous fashion. In this example, each message represents an image to be watermarked, but this code is meant to be generic and representative of any operation that included some I/O aspects to it. Notice that all I/O operations in the ProcessMsg routine utilize the Begin and End methods as described in the Asynchronous Programming Model (APM).

 

      private IEnumerator<Int32> ProcessMsg(AsyncEnumerator ae, CloudQueueMessage cloudMsg) {

 

         QueueMessage msg = QueueMessage.Parse(cloudMsg.AsString);

         var ctx = s_tables.GetDataServiceContext();

         var query = (DataServiceQuery<MyEntity>)(from e in ctx.CreateQuery<MyEntity>(c_containerName)

                                                     where (msg.PartitionKey == e.PartitionKey) && (msg.RowKey == e.RowKey)

                                                     select e);

         query.BeginExecute(ae.End(), null);

         yield return 1;

         var p = query.EndExecute(ae.DequeueAsyncResult()).FirstOrDefault();

         if (p == null) yield break;   // Entity deleted, skip this one

         // Grab image from blob (could throw), thumbnail it, create new thumbnail blob

         var container = s_blobs.GetContainerReference(c_containerName);

         var blob = container.GetBlockBlobReference(p.PhotoBlobID);

 

         // Get our blob’s attributes

         blob.BeginFetchAttributes(ae.End(), null);

         yield return 1;

         blob.EndFetchAttributes(ae.DequeueAsyncResult());

 

         // How do we get the length of a blob

         MemoryStream ms = new MemoryStream(checked((Int32)blob.Properties.Length));

         blob.BeginDownloadToStream(ms, ae.End(), null);

         yield return 1;

         blob.EndDownloadToStream(ae.DequeueAsyncResult());

 

         String newBlobID = Guid.NewGuid().ToString();

         CloudBlockBlob newblob = blob.Container.GetBlockBlobReference(newBlobID);

         newblob.Properties.ContentType = blob.Properties.ContentType;

         newblob.BeginUploadFromStream(CreateWatermarked(ms), ae.End(), null);
         yield return 1;

         blob.EndUploadFromStream(ae.DequeueAsyncResult());

      }

 

For more information on the AsyncEnumerator, you will want to read Jeffrey Richter’s June 2008 Concurrent Affairs article (see http://msdn.microsoft.com/en-us/magazine/cc546608.aspx). The AsyncEnumerator is part of the Wintellect Power Threading Library which you may download from here: http://wintellect.com/powerthreading.aspx

Microsoft published an updated FAQ (May 3, 2010) for SQL Azure, available here

The FAQ is very thorough and is a “must read” for any organization planning a relational database migration or new cloud application

“This paper provides an architectural overview of SQL Azure Database, and describes how you can use SQL Azure to augment your existing on-premises data infrastructure or as your complete database solution”

Windows Azure Content Delivery Network (CDN) caches your Windows Azure Data Storage blobs at strategically placed locations around the world (18 at the time of this blog post). The purpose of the CDN is to provide maximum bandwidth for delivery of content to our applications and users. Building massively scalable applications requires squeezing every ounce of juice possible from the infrastructure and machinery. The CDN significantly improves retrieval performance for our most frequently used anonymously accessible read-only data.

The CDN works by caching the first request made to retrieve a blob from Windows Azure Data Storage using a specialized URL that maps to our data storage account. It then keeps the results of that query in that geographically localized cache so that subsequent requests to the same blob can be performed from the cache, which is much faster than the original trip to fetch the blob from the more geographically distant data center. Any blob requested through a special CDN URL will be served from the local cache until its Time To Live (TTL) has expired, in which case a fresh copy of the blob will be retrieved from data center blob storage with a fresh TTL. As the first request still requires retrieval from data center storage, frequently used blobs will receive the greatest performance boost. There is no performance advantage to serving infrequently used blobs through the CDN. Because the emphasized purpose of the CDN is to improve throughput, it is only available for anonymous access of public blob containers, thereby eliminating the overhead of authentication and authorization. At the time of this blog post, the CDN was still a Community Technology Preview feature. You can turn it on in the Data Storage configuration page of the Windows Azure Developer Portal.

To learn more about the CDN, please start with this article on the Windows Azure Team Blog located here.

I’ve assembled a short list of training materials and utilities that are helpful in learning the Windows Azure platform

Before we can begin using the Windows Azure AppFabric Access Control Service (ACS) to decouple our applications from security concerns and enable claims-based identities we need to understand the Resources contained in the Service Namespace and what role they play in the authentication and authorization infrastructure. This brief blog entry is meant to provide you with the basic understanding and vocabulary required to get started.

Service Namespace

The Service Namespace is an abstraction for the collection of ACS Resources including Token Policies, Scopes, Issuers, and Rules (which are described in more detail below).

The Service Namespace is comprised of a hierarchy of related entities. At the root of this hierarchy is the AppFabric Service Account Project. The Service Namespace can be broken into three constituent parts as shown in Figure 1, the Token Policy, the Scope, and the Issuer.

image 
Figure 1 - Service Namespace Object Hierarchy

Token Policy

A Token Policy defines token expiration periods and digital signing keys. A Token Policy may be shared across Service Namespaces and is used by the ACS to sign the response tokens and to set their expiration periods.

Issuer

An issuer is a party that will issue requests for tokens from the ACS. An Issuer may not be shared across Service Namespaces.

Scope

A Scope groups rules governing ACS token issuing behavior. A Scope contains exactly one internal RuleSet object which can be populated with multiple Rules. A Scope may not be shared across Service Namespaces.

Rule

A Rule defines a transformation between one or more input claims and one or more output claims. Rules cannot be shared across Service Namespaces. The Rule feature is one of the most powerful and innovative features of the Windows Azure ACS. 

RuleSet

A RuleSet is a collection of individual Rule objects. We do not directly create the RuleSet object; one is created automatically for us as part of a Scope.

Claim

The ACS Rules engine uses Rule objects to perform actions using incoming claims to create outgoing claims. A Claim is a statement that can be made about an entity. Applications and Services such as the ones that you will build specify what claims are necessary to perform a given operation.

Identity

Simply stated, an Identity is a collection of claims. Your ACS enabled application will accept identities from the ACS, an identity provider that your application implicitly trusts. The ACS will verify the claims made by your application users, and will transform those claims into ones usable by your application using the Rules defined in the Ruleset of the Scope that applies to your Service Namespace.

Before you begin your Windows Azure development experience in earnest, you should be aware that leaving development and test instances deployed and running in the cloud can be expensive. Be aware that you will be billed for deployed service instances, even if they are suspended, so it is important that you actually remove the instances when you are done using them. Microsoft has many options available for providing developers with their own “little patch” of the cloud fabric quilt; however, it is easy to exceed these limits if you are not careful or simply forget to remove them.

To give you some idea, my Windows Azure bill has been running over $500 per month for four hosted services and four storage services (plus a few extra instances in staging environments). This is for mostly idle instances (used for demo and training purposes). There are many variables in pricing outside the scope of this short blog post, so your costs could be much different. My purpose in drawing your attention to it here is to give you some financial sense as to why I view the information in this blog post important.

When developing Windows Azure cloud applications, you will want to make heavy use of the DevFabric and DevStorage. You should only deploy to the cloud when necessary to test your application in a way that cannot be easily done on your desktop. For example,  it is impossible to gain much knowledge about the scalability of the Windows Azure Platform from an application running solely on the desktop, or even to observe many of the features of the cloud fabric such as a simulated instance failure.

The Windows Azure Developer Portal allows us to install and remove application deployments. The portal is very straight forward and easy to operate, but the process requires operator interactivity. A deployment can take 30 minutes to get running once it has been uploaded and deployed, so there can also be the problem of a developer having to wait and monitor the deployments before taking subsequent steps. As developers, we want to automate steps of our deployment that are easily identified and highly repeatable. Good news! Windows Azure may be managed through the developer portal, but its RESTful API has been exposed for automation purposes. You can read more about the Windows Azure Service Management REST API here on the MSDN website.

Microsoft has built a set of PowerShell CmdLets, which leverage this RESTful API thus allowing us to script our deployments and service removals, making them rapid and repeatable. You can get the PowerShell CmdLets off of the MSDN site here and there is a great “getting started” blog post on the MSDN site here. Using the Windows Azure CmdLets I have been able to automate my deployments and service removals, potentially saving myself hundreds of dollars per month in unnecessary charges (I’ll let you know next month exactly how much I saved).

My first experience with the Windows Azure Service Management CmdLets wasn’t entirely painless. I wasn’t able to get the “New-Deployment” CmdLet to operate properly out-of-the-box, and I ended up spending numerous hours trying to diagnose why.  The traffic is encrypted over https and the Windows Azure error messages can often be deliberately vague for security reasons. Fiddler wasn’t of much use either, as Azure detected it’s man-in-the-middle certificate and refused to let me monitor the unencrypted https wire traffic. Failing to be able to watch the traffic, I attached a debugger to the Windows Azure Service Management CmdLet source code and monitored execution,. This allowed me to discover that the WCF Behavior interceptor which inspects outbound messages sent to the Windows Azure management endpoint and appends the required Version Number header to the request was unable to find the httpRequest property in the outbound message. The code assumed that this property would always be present (it didn’t check first) so an unhandled exception was being thrown causing the deployment to fail. I did not get to the bottom of why the header was missing (I’m hoping to find this out at a later time), but I revised the ClientOutputMessageInspector interceptor code to try to get the property first, and then add it if it did not exist. This seemed to fix the problem as I am now able to successfully deploy. My code revision follows. You can find the BeforeSendRequest method in the ServiceManagementHelper.cs, file, near line 206:

image

I’m interested in others have run into the same issue, or if it was local somehow to my experience. Please drop me a note and let me know.

 

Each instance of Windows Azure Service Role runs its own monitor to gather its own instance specific diagnostic data. The problem that immediately presents itself is knowing what exactly is being collected, where the data is being saved, and how to retrieve it for inspection. The purpose of this blog post is to illuminate these areas a little bit better.

So lets start at the beginning… When you create a new Windows Azure Web Role, Visual Studio will automatically add a boilerplate WebRole.cs file to your project. By default, the OnStart() method of the WebRole is overridden with an implementation that starts the Windows Azure Diagnostic Monitor. By default, Windows Azure will log its own diagnostics, IIS 7.0 logs, plus Windows Diagnostics.

 image

The argument to the static Start method of the DiagnosticMonitor class is the Windows Azure Data Storage connection string located in the ServiceConfiguration.cscfg file.

image

When the value of the connection string is “UseDevelopmentStorage=true” then the Developer Fabric will use the local Development Storage to simulate storage in the cloud. Of course in staging or production, this string would point to the RESTful data storage endpoint and would contain your Windows Azure Data Storage AccountName and AccountKey.

We can inspect the “wad-control-container” of Blob storage to find the collected diagnostic information. run your favorite Windows Azure Storage exploration tool. In my example, I am using the Windows Azure Storage Explorer from the CodePlex site. You can use this tool to download the container and its contents to your local file-system for further analysis.

 image

We can also augment the diagnostic data collected to include other data sources as well.

Let’s say you’re also interested in capturing failed IIS and ASP.NET requests. You can augment the data that Windows Azure is already capturing by adding a <traceFailedRequest> element to the <system.webServer/tracing> section. Of course you can control the paths of the page(s) to be tracked, and you can set the verbosity to an appropriate tracing level for your circumstance, including filtering the general areas of coverage such as Authentication, Security, etc. An example might look like this:

 image

We can also collect Windows Event Logs by simply adding an XPath expression of the event sources to be captured of the WindowsEventLog.DataSources property located on the configuration object.

image `

It is possible that a hardware or software defect might be causing mysterious or intermittent operating system failures. Fortunately, we can also configure our instances to collect full or partial crash dumps by calling the static EnableCollection method of the Microsoft.WindowsAzure.Diagnostics.CrashDumps type. Passing true to this method will capture complete crash dumps, passing false will collect partial dumps.

image

Although the path may be slightly more illuminated now, there are still many dark areas beyond our present location. In my opinion, there is still much work to be done  in tooling and making this data useable in “real world” scenarios. It is trivial to sift through a dozen or so entries from a single service instance, but it is nearly impossible to imagine the difficulty of finding what you are looking for in the potentially massive data collected by multiple simultaneous service instances running a busy high-volume application. There are several parties working to provide solutions in this space, but no clear leaders at this time.

Idempotency is the mathematical term used to describe a system that produces the same result when a formula or procedure is applied numerous times against the same target. In software systems, this translates to an ability to perform an operation more than one time with knowledge that the resulting state of the system will be consistent. Idempotency does not dictate the mechanism by which this consistency is to be achieved, only the fact that it must.

Queues are useful in Windows Azure for delivering work requests to worker roles. It is the primary architectural means by which web roles signal worker roles to begin asynchronous performance of work. When a worker role accepts a message from a queue, the queue hides that message from other workers for 30 seconds to reduce the probability that a message will be operated on by multiple simultaneous workers. This approach does much to greatly reduce the probability that redundant work will be performed by the system, but it does not prevent it!

If a message takes longer to process than is allowed by Windows Azure, then the message is made visible again for other workers to pick up and process. It is therefore possible for more than one worker to be working on the same work at the same time… the original recipient of the message, plus the new worker who picks it up when it becomes visible in the queue again. In addition, the typical pattern for failed or corrupted message receipt in a fault tolerant system is to retry message delivery. This can also lead to redundant work being performed.

The fact that multiple workers may work on the same message makes it essential for us to design our software for use in the cloud with idempotence in mind. An argument that idempotence will only matter once in hundreds of thousands of transactions is still very problematic if your system may be processing millions of transactions, or where the integrity of your data may be mission critical.

There seems to be a lot of blog posts and forum entries on the importance of writing idempotent services, but very little in the way of constructive feedback that I was able to find on how developers should go about achieving the objective of idempotency, and thus the purpose of this blog post.

One suggested technique for achieving idempotency that I read on several blogs and saw being discussed in forums while grokking material on this topic was to avoid the problem altogether. Many people suggested creating a table of message IDs and then forcing the workers verify the state of a message by consulting the table before processing an incoming message. Even one book author of SOA architectures put this idea forward. To my way of thinking, avoidance of idempotency does not make your software idempotent; such schemes are merely a pattern to avoid the problem rather than to design for it. This isn’t necessarily a bad way to go for some software systems, but be aware that pattern itself may contain its own set of flaws because an error could keep the table from being updated, and there is a time windows where the database table itself might hold inaccurate state information thereby allowing the two workers to still execute simultaneously. The old two-phase commit solution starts to raise its ugly head. Since such schemes could have problems, a better question to ask yourself is this… what is the sate of your data will be after the execution of a message received multiple times. Is your data consistent or inconsistent?

For a system to be truly idempotent we must be capable of processing the same message twice and after processing that message we must still be in a consistent state.

Let’s say that we want to update a customer’s address. Our service receives a message from some application with the new street address of our customer. We process the message and the address is changed in our database. If we receive this message again the work will be performed twice. No matter how inefficient or unsavory this may be, the resulting state of the customer’s address will be identical. In other words, our overly-simplified address change operation would be considered idempotent. If two messages for the same customer arrive carrying two separate addresses, the first one would succeed and so would the second one. Again, we would still be idempotent in the sense that our data was consistent; however, we have set ourselves up for a “last-in-wins” model. This is not necessarily a bad thing but we should be aware of it in our design.

Many businesses extend credit to their customers. No reasonable business would extend such credit without placing limits on it. Instead of the customer address example, let imagine that our messages are for new orders from our customers. If such a message were to be processed twice without any concern for idempotency, our customer might receive twice as much product as they ordered, and they may find themselves prematurely exceeding their credit limit on subsequent orders. This would clearly not be idempotent. So how do we get to where we want to go?

If the message contains the invoice number, then we might construct our business and database operations to perform the add operation in such a manner as to ensure that the data is never inserted into the table twice. We could perform the insertion into the invoice table as part of a transaction where the invoice number was not already present in the table. This would result in the insertion of one row into the table for the first receipt, but zero rows into the table on subsequent attempts. In other words our add operation would leave the data in a consistent state no matter how many times we replayed the message.

If the message was to perform an update of an existing invoice, then things get a little more sophisticated, but still very manageable. By using and comparing a timestamp column for equality with the value contained in the message we can perform the update where the row’s timestamp column is equal to the value contained in the message. If the incoming message caries an equivalent timestamp of the data at the time that it was issued to the sender, then this timestamp can be checked against the one in the database as parameter to the WHERE clause

UPDATE Invoice Set Amount=@Amount WHERE InvoiceNumber=12345 and tstamp = @tstamp

If the data has not been updated by another worker since it was issued, then the update operation will modify the matching invoice number row, but  if the message is duplicate, then the tstamp column will have a new value which will result in zero rows being updated (as no rows will satisfy the timestamp equality constraint). We can now process an infinite number of updates while remaining idempotent. Of course this technique would be a better approach for the simpler address change example that I provided above.

Clearly there is much more that can be said in this space, but that is all I have time for in this blog post. Look for additional advice and commentary in future posts.

  

Entity Framework derived types support inheritance and relationships, just as you’d expect from any Object Relational Mapper tool (ORM)

Windows Communication Foundation Data Services (Astoria) throws an exception if your derived entities have relationships. That means if a Contact, a Lead, and a Doctor all inherit from a Person… and derived entities have different needs to link to external data… too bad!… Suddenly we have Sales Leads having properties for writing medical prescriptions! Good grief! That’s not going to be pretty… so ugly in fact… we probably don’t even want to go there…. The official prescription for this from the MS forums and various blogs is to move those relationships into the base Person class and then refactor your database storage as necessary…. perhaps having separate and distinct tables for Doctor, Lead, and Contact. Wait-a-sec you say… you picked an ORM because you didn’t want your Object Model to be dictated by your Data Model… well… that’s true… but here you find yourself anyway…

This bit me in the butt big-time on a project I was working on. There was nothing in the documentation and or the services behavior to lead me to believe that this constraint would be present… (well… okay… other than the initials CTP </grin> but still…) Knowledge of the constraint happens way too far along on the development cycle. We did not realize the presence of this constraint until after my team had invested heavily in development of our ORM and database models. These models worked well so long as we were only unit testing them. It wasn’t until we tried to expose them through Astoria that the big bomb dropped in the room.

Having the service tier force a rigid database implementation upon a team is the tail wagging the dog. Also... developer teams often do not have control over what a DBA may require in the database, and WCF Data Services should not be so brittle and constrictive as to disallow common use-case scenarios like practical and real-world use of inheritance. There is nothing complex about this business use-case… in fact I would say it is quite representative of the norm.

My friend Julie Lerman (author of Programming Entity Framework on O’Reilly Press) created a suggestion for this basic “feature” on the Microsoft Connect site. If you agree with its importance… please take a second to click the link and vote on it:

https://connect.microsoft.com/data/feedback/details/532592/derived-entites-should-be-allowed-to-have-relationships-in-wcf-data-services

Another item to be aware of regarding inheritance with Entity Framework: two derived types cannot share the same primary key. That means that the a Sales Lead cannot also be a Contact… thus forcing duplication of the Person and all the data related to that person (duplicate addresses, duplicate phone numbers, duplicate emails, etc.). There are no warnings in the designer of this snake in the grass… you’ll know the first time you attempt to retrieve data in the database where these conditions occur (a Person existing in more than one of the derived database tables). You can save the data this way… you can just never retrieve it after the fact! Again… this appears to be a very naïve constraint.

When running the WCF / Windows Azure samples (see my previous blog entry) you may get an error indicating that the Polling Duplex Binding Element cannot be loaded if you have Silverlight v3.0 installed (The type 'System.ServiceModel.Configuration.PollingDuplexElement, System.ServiceModel.PollingDuplex' registered for extension 'pollingDuplex' could not be loaded).

Replace the reference in the WcfSamples project from the v2.0 Silverlight Duplex Polling assembly C:\Program Files (x86)\Microsoft SDKs\Silverlight\v2.0\Libraries\Server\System.ServiceModel.PollingDuplex.dll with the v3.0 version: C:\Program Files (x86)\Microsoft SDKs\Silverlight\v3.0\Libraries\Server\System.ServiceModel.PollingDuplex.dll

Set the Copy Local property to true.

Code samples demonstrating how to host WCF Services under Windows Azure can be found on the MSDN site at:

http://code.msdn.microsoft.com/wcfazure

More Posts Next page »