Power of startup tasks in Azure


Hello rock stars, ready for some hacking? Off course!

Some of us might know that Azure (or for that matter many cloud providers) presents two main implementation paradigms:

  • PaaS : Platform-as-service – is more like renting a serviced apartment. Which means the focus is on implementing requirements. Consumers need *not* think about infrastructure procuring, installing operating system patches etc.Just zip the package and give it to Azure which then manages the show for you.
  • IaaS : Infrastructure-as-service – is like renting a condo. Which means apart from implementing business requirements – it’s consumer’s duty to maintain and manage the virtual infrastructure.

IaaS provides more power for sure, but remember what uncle Ben told to Peter Parker. Great power comes with great responsibilities! :)

PaaS world is little different and especially if you are a Dev-Test-Deploy kind of shop it makes sense to invest in PaaS paradigm. But then many have this question that if Microsoft is providing all the software and platform then how do I run my custom software in Azure PaaS? How do I install piece of software that Azure does not provide out of the box? Azure does answer that question in the form of start up tasks.

Idea is – you run a scripted command in form of *.bat or *.cmd in a bootstrap manner. So even if Azure recycles or reimages your machines, the start up task/script always makes sure that things are correctly installed.It is very powerful approach when you have a small task to perform or a tiny custom widget to install . E.g. : you want to install an ISAPI extension, so that you can run PHP on the hosted IIS. Or you want to install your custom diagnostics monitor that records a stream of SLL events and do something about it.

Just to be clear, start up tasks are *not* meant to perform heavy duty operations. E.g if you want install let’s say a SAP instance on Azure – you should go for IaaS and start up tasks are not meant for you.

Breaking shackles


We all might know that Azure is offering a distributed Redis cache-as-service. It is true platform as service, wherein consumers need not worry about disaster recovery, business continuity planning etc.  Let’s just say that you want to run and consume your own Redis cache instance as a local cache inside a Azure web role. Let’s see if we can pull it off using start up tasks.

MS Open Tech has forked Redis cache and have ported Redis to Windows. We would be using those bits.

Step 1 : Building the correct file structure

Let’s download the Redis cache bits for windows from MS Open Tech page. We need to add two files at the root of the role that would be used in a start up task.

  • InstallPatch.bat : You can name this file whatever you please. This file is where we would write a command that would be executed by the startup task.
  • Redis-x64.msi  : This is the actual Redis cache windows service executable (msi in this case) we are installing.

Make sure both these files are marked as “copy to output directory”. This is to ensure these two files are copied as part of the package. image

Step 2 : Building a bootstrap command

Let’s build a command that can be used to install the Redis cache windows service (.msi in this case) on the Azure VMs. The idea is – you want to install this MSI in a silent/quite mode since you do not want a GUI to wait for user’s action. After looking into the documentation (that gets downloaded along with the bits) here is how we can achieve it :

As you can see below we can use /quiet switch to msiexec command that makes sure the installation happens silently. Step 1 is going to copy the the command and exe file to the package.

To get to the root of the role we can use an environment variable called %RoleRoot. PORT=6379 parameter makes sure that the Redis cache server starts listening on port 6379 after installation

msiexec /quiet /i %RoleRoot%\AppRoot\Redis-x64.msi PORT=6379 FIREWALL_ON=1
Step 3 : Encapsulating the command in a start up task

Now that command is setup let’s see how to encapsulate this in a start up task. Good news is it is configuration driven. Service definition configuration (csdef) to be specific. Insert following config in your csdef file.

  <Task commandLine="InstallPatch.bat" executionContext="elevated" 
    taskType="background" />

Above config instructs Azure role to run InstallPatch.bat file in an elevated manner (as an Admin) in the background. It executes the command we prepared and makes sure that redis-x64.msi is installed correctly on the Azure VM. Apart from the background, taskType can take following 2 values:

  • foreground : foreground tasks are executed asynchronously like background. The key difference between a foreground and a background task is that a foreground task keeps the role in a running state. Which means a role can not recycle until and unless a foreground task is complete or failed. If the task is background in nature then the role can be recycled even if the task is still running
  • Simple : simple task is run synchronously one at a time
Step 4 : Using the Redis cache

If the startup task described is successfully complete then we have a Redis windows service installed and listening on 6479 port. Let’s see how we can use it in a app.

Let’s pull a nuget package called ServiceStack.Redis.Signed which is a client library to consume the Redis service. There are many Redis clients available for .NET, you can choose any as per your liking. We can do something like below:

using System;
using ServiceStack.Redis;

namespace WebRole1
    public partial class Index : System.Web.UI.Page
        protected void Page_Load(object sender, EventArgs e)
	    // Creates proxy to Redis service listening on
            var client = new RedisClient("", 6379);

            var key = Guid.NewGuid().ToString();

            // Sets a key-value pair in the cache
	    client.Set(key, Guid.NewGuid().ToString());

	    // Gets the value from the cache
            var val = client.Get(key);

            Response.Write("Cache Value : " + val);

Once again, I am not advocating that we should use Redis cache service as described above.  We should actually use the Azure Redis cache service for any cache implementation. I used it just to demonstrate what we can achieve using startup tasks. It’s a little hack, which I am proud off :)

Happy coding, hacking!

Azure worker role as a service


Hello rock stars, ready to try something crazy? Ok, here you go. We know that Azure worker roles are meant to perform long running, background tasks. E.g. the most popular implementation pattern is that the frontend website/service accepts the requests and then worker role that is running in a separate process goes through the requests in an asynchronous fashion. Very powerful approach, especially since you can just scale-out and yank though millions of requests coming in. 

One important point to note here though is – worker roles do not have IIS. They can have endpoints, but they do not have IIS (or hosted web core for that matter). This means- though you can have endpoints but you can not really have IIS hosted services in the worker roles. You need to self host them.

Another angle that bothers many is worker role’s testability. As far as testing worker role as a white-box, you can just use the worker role dll/code and use hooks and mocks to test it as a unit. But there is a no real easy story to test worker role as the black box from outside.

Breaking Shackles

Here I am making a proposition to host the worker role code as a service, which then can be invoked from outside in a request-response manner. Again, this is little crazy, out-of-the-box approach so weigh it carefully and use it if it fits your bill.

Defining input endpoint

Azure worker roles can have internal or input endpoints. To make it simple – input endpoints are exposed externally while internal endpoints are not. More about internal and inputs endpoints for some other post (or you can Google for it). In this scenario – since we want to expose the worker role code to be invoked from outside we are going with an input endpoint.

As you can see below a new input http endpoint has been created which would use the port 80.


Self hosting service

Now that we have an endpoint, we can host the service using it. Going back to the worker role phenomenon we discussed above – worker roles do not contain IIS. So we need to host this http service without IIS. While I was thinking about it (and believe me there are many ways of doing it), the cleanest approach that came across was to use OWIN or what we refer to as Katana in the Microsoft world. Katana is probably the simplest, light-weight way to self-host a http service. Again, I am not going in much of Katana details here.

Go ahead and pull these nuget packages in your worker role project

As you can see in the code snippet below – we are using the endpoint (Name : Endpoint1) that we created in the above step. Best place to write code to self-host the service is in worker role’s OnStart() method that gets executed as the first thing after the deployment. We are hosting a basic ASP.NET Web API style REST service using Katana.

public class WorkerRole : RoleEntryPoint { public override bool OnStart() { var endpoint = RoleEnvironment.CurrentRoleInstance.InstanceEndpoints["Endpoint1"]; var baseUri = String.Format("{0}://{1}", endpoint.Protocol, endpoint.IPEndpoint); WebApp.Start<KatanaStartup>(new StartOptions(baseUri)); return base.OnStart(); } } public class KatanaStartup { public void Configuration(IAppBuilder app) { var conifiguration = new HttpConfiguration();  

conifiguration.Routes.MapHttpRoute("Default", "{controller}/{id}",

new { id = RouteParameter.Optional }); app.UseWebApi(conifiguration); } }

Once the service is hosted, it’s just matter matter of defining the controller that matches the URI template defined in the service host. Here is how my basic controller looks like:


/// Actual Worker Role code that's getting called in WorkerRole Run() /// public class Processor { public bool Process(int value) { return true; } } /// /// Service controller implementation that invokes actual worker role code /// public class ProcessorController : ApiController { public HttpResponseMessage Get() { try { var flag = new Processor().Process(12); } catch (Exception ex) { return new HttpResponseMessage() { Content = new StringContent(ex.Message + ex.StackTrace) }; } return new HttpResponseMessage() { Content = new StringContent("Success!") }; } } }

As shown in the above code snippet – Processor class encapsulates actual worker role logic that gets executed every time worker role’s Run() method is called. ProcessorController (forgive me about naming it ProcessorController :) ) is the class that gets instantiated and invoked by the OWIN hosted service. This class is nothing but a pass-through piece that ultimately invokes the actual worker role code. Here I have shown a basic Get implementation where in exception cases we are responding with exception details and a dummy string in case of success. You are encouraged to be creative and implement the best REST service patterns and practices or pass parameters using the Post implementation.

Using service for testing

Go ahead and host it in Azure or in the local emulator. Try accessing using:

Now that the service is hosted, it can be invoked from outside to perform the black box testing as we were planning.

PS : Here I haven’t given much thought to the endpoint security. I know it is pristine important, but that’s not the focus of this post.    

Service slow?Blame it on IIS..


Alright, here is the situation we were thrown in recently. Our bread n’ butter API got released. It went well initially and then intermittently started producing extraordinarily high response times. Everybody freaked out. We had tested the API for large, consistent load and here it was not withstanding moderate, intermittent traffic.


Our initial reaction was the cache (we are using in-memory, old fashioned System.Runtime.Cache) is getting invalidated after every 30 odd minutes as an absolute expiry. But the instrumentation logs were *not* suggesting that. There was no fixed pattern. It was totally wired! After some more instrumentation and analysis we zeroed down on some heavy singleton objects. We use Ninject as our IoC container. We have couple of objects created on startup and thrown into Ninject singleton bucket to be used later during the lifetime. What we observed that these objects were getting destroyed and created after certain amount of time! And creation of the objects was obviously taking hit. We were freaked out even more. Isn’t the definition of singleton – create object only once? Or is it Ninject – that is destroying the objects? What we found would surprise some of you. Ready for the magic?


While just browsing through the event logs we found following warning message in the Systems log (event source – WAS):

The default idle timeout value is twenty minutes, which means your app pool is shut down after twenty minutes if it’s not being used. Some people want to change this, because it means their apps are a bit slow after twenty minutes of inactivity.

That started the ball rolling. Ah, so it was the good old buddy IIS and the app pool. Even if its Azure and you want to call it as hosted web core, end of the day it is IIS. And IIS hosted website/web services run under app pools. And app pools, the good babies they are tend to recycle. :) They recycle after every 20 minutes (default value) if kept idle.

We had 4 instances and with the moderate traffic, Azure load balancer was not routing traffic to all instances consistently. So couple instances were getting idle and and app pools on them were recycling. After some time – when Azure load balancer used to divert traffic to these idle instances, those singleton objects which were destroyed would get created again and the incoming service call would pay the price to create these objects which is unfair!

To resolve this – we decided (and this solution worked for us and I do not claim that it applies to your scenarios as well. So if you want to try it at your home, make sure to wear helmet :) ) to do couple of things:

1. We had Akamai (or traffic manager) ping the service after every 5/10 seconds. This way the app pool does not go idle and hopefully never recycles.

2. We had a startup task in the web role to go set the IIS app pool idle timeout as 0. This means app pool lives forever. Obviously every now and then Azure fabric controller recycles the machines itself , but that’s ok.. I do not think we can prevent that.

Start up task

Added following cmd as the start up task in the web role.

%windir%\system32\inetsrv\appcmd set config -section:applicationPools -applicationPoolDefaults.processModel.idleTimeout:00:00:00

Curious case of Windows Phone


I like Microsoft. I like their products. I am comfy and feel homely while using their stuff. Call me Microsoft fan boy, but I like Windows Phone as well. That’s why — when I was planning to buy new phone — I wanted to go for new shiny flagship Windows Phone on the block. After some research and reading reviews I zeroed down on Nokia Lumia 930.

As any other guy interested in buying a smart phone I went to Best Buy as I was hoping to touch and feel the device. While Best Buy had too many people jumping on new iPhone 6 [on a side note- if you have so many people jumping on your phone, your device is going to bend for sure ;)] , Microsoft’s section was relatively empty. When I asked the attendant about Nokia Lumia 930,

He said “You mean 920?”

I said “No, I mean 930 only and I am looking forward to buy the same”

He said “Well, there is nothing called Nokia Lumia 930”. I google-binged on my phone and showed him.

He felt convinced and said “Oh, you mean Nokia Icon?”

I replied “No, I mean 930. I want to buy it unlocked and I am not interested in buying the Verizon version.”

Blue shirt replied “We do not have one!”

Duh! So the new shiny flagship Nokia Lumia 930 was not available in the Best Buy store. Please note that this store I am talking about is in Redmond/Bellevue where Microsoft is head quartered! It’s like I am ready to spend money and the opportunity has been denied.

Me being good citizen — walked out of Best Buy and went to Microsoft Store in Bellevue. In Microsoft store also the unlocked Lumia 930 was not on shelves and folks there didn’t know the estimated arrival date.

I was saddened. I was taken aback. How can you announce a device and not be available on shelves for people to walk-in and buy?


After thinking for few days I think I might know the reason. I might be wrong, but do not correct me. The reason could be exclusivity. Verizon got exclusivity rights (at least in early days of 930 and may be only in US) and they packaged it as Nokia Icon. And they are denying 930 to appear in unlocked version, so that other people cannot buy it. That’s evil. Pure evil.

I still love Windows Phone and would still would wait for 930 to come in unlocked format. But maybe this is one of the reasons why Windows Phone is not working as it should be. Software is great. Hardware is great. Ecosystem is nice and improving as we speak. But what about marketing it? What about taking an extra effort to make sure devices are ready to be picked by customers from various stores? What about retail? May be that’s not Microsoft’s forte and they are losing the race in last round. May be.

Stepping out with Azure WebJobs

A while back, Microsoft Azure team announced preview of WebJobs and Hanselman blogged about it. Since then, I always wanted check out Web Jobs and finally I did it recently. And ..holy mother of god! Smile


WebJobs are cron jobs. For operation(s) that are to be performed repetitively or in a scheduled manner WebJob  is good proposition. WebJobs have been packaged along with Azure websites. The popular use case Microsoft is targeting to use WebJobs is as backend for the Websites.

Let’s say you have an e-com website and you are receiving tons of orders. For better scalability, your website (front end) would submit those order to let’s say a Azure queue. You would then have WebJob(s) configured as backend to read messages off top of the queue and process them. Or you can have a nightly cron job, which goes through the orders and sends a digest or suggestions in an email  etc. Or you can have it to do a monthly invoicing. So all those backend jobs that you do not want to perform in front end for obvious reasons can be done using WebJobs.

Even if WebJobs comes packaged along with Azure Websites – nobody is stopping us from using them without websites. For doing certain operations repetitively, Azure PaaS already offers what is termed as Worker Role. But I personally find worker roles very ceremonial. They do make sense when you want to perform long running, heavy-duty operations and you need horizontal/ vertical scaling to do that. But for doing tiny, repetitive, loopy code blocks or doing certain things in scheduled manner worker roles are expensive (time, money-wise). Worker roles are powerful, no doubt, but with great power comes great responsibility. (Can’t believe I just said that Smile) Primarily what they offer is an infinite while loop and then it’s your responsibility to implement scheduling, triggers etc.  WebJobs are light-weight, deployment-friendly and provide in-built mechanism to schedule stuff.

Implementing popular use cases using WebJobs

Conceptually, WebJob (the way I understood them) is an infinite while loop that listens for an event or trigger. You can make this listener trigger again and again continuously or you can schedule it if that’s the story you want to tell.

For all sorts of WebJobs development, Azure Web Jobs SDK and respective nuget packages needs to be pulled.

Here is an example of a WebJob that is implemented as a command line .NET exe. This one reads a message off top of a storage queue and writes that entity to the Azure table storage.  Every time a new message is available in the queue the ProcessQueueMessage() method is trigged automatically along with the respective trigger inputs.


Below code block shows another type of trigger that is listening on a topic/subscription. As soon as a new message is delivered to the subscription it picks up that brokered message writes it to the table storage.
In both examples, and instance of JobHost is created and registered in Main() which ensures the timely or scheduled execution of WebJob.


Interestingly WebJobs support multiple ways to deliver the package. (I prefer and like the command line .NET exe, but suit yourself).Using Azure management portal you can drop a file in a certain location on the IIS box and rest is magic. Following file types are supported :

  • .exe – .NET assemblies
  • .cmd, .bat, .exe
  • .sh
  • .php
  • .py
  • .js

Once the deployment is done one can manage the WebJob – start, stop, delete etc.

Using Website => WebJobs => Add a job switch a new job can be added.


A WebJob can be set to run continuously or scheduled.


Schedule can be as granular as your requirement demands. Interestingly a WebJob can be configured as recurring or as a one-time-deal as well.


Quick word of caution. Webjobs are deployed on the IIS box, so make sure you are not doing such an operation as part of it that hogs all the memory and CPU. Good news though is it does not use IIS’s thread pool. For scaling WbJobs you need to scale your website.

Feel free to ping me in case of any help.

Importance of dead lettering in Azure Service Bus

Not sure we have a word called “dead lettering”. But in the below post I have used it to repetitively.Smile

In every messaging/queuing technology, there is a provision for poison queue or a mechanism to deal with bad messages. Idea is – once you are done with the message if the processing still faulters, move the message(s) aside and deal with them later. With Microsoft Azure Service Bus topics and subscriptions there is a solid provision to move such poison messages aside, so that the next set of message(s) can be picked up for processing. In Azure Service Bus world, it’s called as dead lettering the messages.

When and how should we dead letter the message?

There are many scenarios in which message are moved to dead letter implicitly and in some scenarios it’s wise to move the message to dead letter explicitly.

As for all other Azure managed services, Service Bus also throttles as it needs to protects itself from denial of service (DoS) attack. As a developer, it’s your responsibility to have retry mechanism and retry policy in place (Recommended : exponential backoff retry interval policy). One of the ways to implement it is by using Delivery Count attribute on the message itself. Delivery count on the message gets incremented every time it reads/de-queues the message off top of a subscription. Max Delivery Count can be set on subscription to make sure the retry is not happening indefinitely. Once the Max Delivery Count is reached the retry attempts can be assumed to be exhausted and message can moved to dead letter. Following screenshot shows Max Delivery Count property config for subscription.


In some cases we should not retry multiple times. E.g : the message itself is bad. Required attributes are missing or message is not getting serialized correctly, in such non-transient cases even if you retry multiple times the processing is going to fail every time. In scenarios like this, we can explicitly dead letter the message and save some CPU cycles.

This is how you can move messages to dead letter explicitly.

   1: if (message != null)

   2: {

   3:     message.BrokeredMessage.DeadLetter(string.Format("DeadLetter Reason {0} Error 

   4:     message {1} ", "SerializationException", ex.Message), ex.StackTrace);

   5: }

Important things to remember while dead-lettering the message

As shown in the above snippet make sure you are specifying the dead letter reason and dead letter exception details etc. which helps in obvious debugging.

Also make sure following two properties are enabled for the subscription. This would make sure the messages are dead lettered implicitly if the filter evaluation goes wrong or the message gets expired.


Remember, dead letter messages would not be processed until something is not done to them. The recommended practice here is to fix the messages (fix the missing attributes etc.) and move them back to the subscription. How to do that? That’s for another blog post.Smile

Azure Endpoint Monitoring

Came across this cool feature on Windows Azure Management Portal called “Endpoint Monitoring”. The feature is still in preview, but worth giving a shout-out. Azure Org lately has hit this nice momentum of releasing features one after the other. They initially release the feature as preview and once it stabilizes, once enough hands are dirty and issues are ironed out, they GA it. This is a right way of releasing features to production, in my opinion.

Endpoint Monitoring, as the name suggests monitors if your web service/web site endpoint is up or not. The idea is, you provide an endpoint to Azure configuration and they call back that endpoint periodically and maintain the log for the same. The good thing is, you can make Azure call back your endpoint from different datacenters across the geography. This helps in scenarios where the endpoint is up in let’s say Chicago DC, but is down in Dublin.

Endpoint monitoring lets you monitor the availability of HTTP or HTTPS endpoints from geo-distributed locations. You can test an endpoint from up to 3 geo-distributed locations at a time(for now). A monitoring test fails if the HTTP response code is greater than or equal to 400 or if the response takes more than 30 seconds. An endpoint is considered available if its monitoring tests succeed from all the specified locations.

In Configuration tab of your service/website on Azure management portal, you will see the Endpoint Monitoring section.



As you see above, two endpoints have been configured called Ping-Chicago, Ping-Dublin. This means whichever endpoint you provide there would be called back periodically from Chicago and Dublin.

The results of the endpoint monitoring are shown on the Dashboard as below:


The detailed log can be found by clicking on the endpoint hyperlink


A typical ping endpoint code should ping all the dependencies the service relies on. E.g. if service uses a SQL Azure database, Azure Storage etc. then your ping endpoint should call these dependent endpoints and return HTTP 200 if good and HTTP 500 if bad. Here is a simple code that can be used in your SOAP/REST service as the ping method.


Know it. Learn it. Love it.