This is the third article in my series on system monitoring for application developers. The previous article covered the basic code architecture for the system monitor.
This article will talk about the deployment of your system monitor to start monitoring production systems. A lot of what I discuss here is from experience, especially as I work in a regulated industry (Financial Services), which threw up a few interesting challenges on how I could achieve rapid deployments and application feedback without breaking any rules.
Deploying into a Production Environment
The goal of your monitoring tool is to regularly check over your critical production systems, so that means you need to deploy into production. If you work in a smaller company that doesn’t have a long and complex deployment process (By ‘complex deployment process’, I mean a process heavily bogged down with paper work a company processes), then this should be straight forward.
When developing monitors you want to be able to iterate quickly, and deploy early and often. If you work in an environment where you can do this and release into production quickly, then great. You need to do, just that. Every time you get a piece of functionality ready for prime time, release it. You just need to make sure that the server you deploy the monitor too can access any log files or databases to gather its information.
The diagram above shows what a typical deployment might look like. You have your monitoring server setup containing the monitor application and the task scheduler. This application then processes logs contained on each target system. The example above has web services, a website, and payment gateway and payment processors. This shows you your most basic type of deployment. This is great if you can do this type of deployment quickly and often.
Regulated and Restricted Environments
The above scenario may not be as suitable for you though. It certainly wasn’t when I was developing my monitoring solution. I work for an American Financial Services organisation based in the UK. At this company we are regulated by local regulators as well as being in scope of Sarbanes Oxley. This means you cannot just go deploying code into production. Well, not easily anyway. There has to be a clear separation of concerns between the development environments and production environments. There also has to be a separation between people doing the deployments, i.e. Developers can’t perform the deployment into production.
This doesn’t mean you can’t deploy your monitor into production at all, it just means you can’t do it quickly, and often. We did manage to get around this with a compromise though. We did this by running the system monitor from the development environment and reading the production logs.
“You did what?” I hear you shout.
“How is that a separation of concerns?”
We came up with a suitable compromise by talking with our technical services team. This all hinged around the idea that we believe all developers should be able to have read only access to production log files to aid fault finding.
Log File Access
I believe developers should all have access to the production log files. Before the day that we had this access, a support scenario would like this this:
The phone rings.
“Hey Steve, this is Mike in Tech Support. We are getting calls from users that the credit scoring service has stopped processing customers.”
“Is this a total system failure, or is it intermittent?” said Steve.
“The users are reporting that it his happening around 1 in 5 times.” said Mike.
“I will be down in 5 minutes.”
Five minutes later Steve is sitting next to Mike, a little out of breath as he ran down the stairs.
“Can you load up todays log file for the Credit Scoring Service” said Steve eager to find out what’s going on.
“Sure, here you go” said Mike.
5 minutes later…
“Hmm, that looks odd. We are getting a strange exception once we get a score back from the credit reference agency. Can you email me the log file so I can look at it upstairs with the code open?”
“No problem.” said Mike, wanting to get the issue resolved quickly before he gets more calls.
In the scenario that played out here, Steve first went to look at the logs, and then he requested that they be sent to him. This is quite normal. The logs are there to help you find out what has gone wrong. Things would have been much quicker if Steve already had access to the logs on the production box.
I must admit, this did raise a few eye brows when we requested that we had access to the production logs, but this can be done safely and still be compliant. In your deployed system, you should ensure that your log files are written into a folder that is separate to the installation of your software, preferably on a second data drive. Then you share that folder to a specific user group; say “IT – SOFTWARE DEVELOPMENT” for example. When you set up the share, you need to set it to be read only so no one can change anything or accidentally delete a file.
With this access in place, as soon as any problems are reported, the developers can save time by just going straight to the relevant logs. The next step from this is to have your monitor checking the logs automatically.
The deployment diagram above shows what this would look like. On the left you have your secured systems running in the production environment. On the right you have your development environment hosting the system monitor. The system monitor can then process the log files directly if it is running under a user account that has permissions to access the log folder shares.
The above is great if you are only processing log files, but what if you are getting some of your data from a database? Surely accessing production data by a developer directly in a regulated environment is bad? This can be a blocker, for us it wasn’t. The reason it wasn’t an issue is down to how we structured our database architecture. First if you’re not already familiar with CQRS (Command Query Responsibility Segregation), you should check out this article on Martin Fowlers blog (http://martinfowler.com/bliki/CQRS.html)
To Quote Martin:
“[CQRS] at its heart is a simple notion that you can use a different model to update information than the model you use to read information. This simple notion leads to some profound consequences for the design of information systems.”
Using this idea we have 2 database environments. The main transactional data, that our systems interact with, payment systems, point of sale etc. We also have a series of read only reporting servers where this transactional data is replicated to the reporting servers every 20 minutes. The transactional data we don’t have access too, but the reporting servers we do as it is read only. This is a workable solution, the only down side is the data isn’t real time, its 20 minute delayed, but this is better than nothing.
Benefits of Deploying to a Development Environment
The benefits gained from deploying the monitors into a development environment have been staggering. When I write software I like to iterate very quickly. I am a firm believer in Agile and Lean development, so as soon as I have something that works, I deploy it. Sometimes I would deploy the monitor as many as 5 times in a day. This means I could react to changes quickly, add new monitors quickly and more importantly fix issues as soon as I found them without the usual paper trail and release process treadmill.
System Monitor for Development Testing
The system monitor doesn’t just need to be used for monitoring production systems. It also makes sense to use the monitor against the same systems in your test environments when you are making changes to them. As your QA testers are running their normal test suites, you can run the monitor to check the test logs. This gives you another level of testing and metrics for the system under test and also some extra testing for the monitor itself.
One thing that is useful to build into the monitor (especially if you are using the one process, multiple monitor’s pattern) is the ability to disable some of the monitors in the configuration file. In your development test environment you may not have all of the systems deployed that the monitor reads, so therefore you need to disable some of the sensors.
Well that concludes the 3rd article in this series on system monitoring for application developers. I hope you have found them useful so far.