The system monitor discussed previously will collect a lot of data from different systems. The amount of information that is collected can be vast and in its raw form may not be that useful except to people who really understand what the data represents. The key thing with a dashboard is to break this information down for different users. This may be a complete view of the data, or just a subset for different purposes, so you have to think about the context in which the data is going to be provided and the intended audience.
For me, a key requirement of the dashboard I built was to make the data from the monitor available to my development and operations team in an easy to view format. This dashboard contains more technical information including exceptions as that is more appropriate for the intended audience. You may need to provide another dashboard to key business stakeholders. The level of information you provide here would most likely be different and contain more business level data.
Your dashboard may be viewed in different places. This might include a large TV attached to the wall, projected against a wall, or the dashboard might be run on a person’s desktop computer. If you are displaying the dashboard in a place where people cannot interact with it, i.e. on a large TV Screen, then you want to make sure the dashboard automatically refreshes itself. I will cover this more later on in the article.
Types of Dashboard
You really are spoilt for choice these days when it comes writing a dashboard. You can keep the display simple or you can really embellish the presentation. As I said earlier, it depends on your target audience. I personally find that keeping the display simple is the best thing to do. I have seen some dashboards that go overboard with dials, speedometers and other fancy graphics, and these can confuse the information you are trying to communicate if you are not careful.
You can develop your dashboard as a native application. This is what I did with my own solution for the company I work for. I work at a company that is based around the Microsoft stack, so this really gave me 2 choices as a .NET developer; Winforms or WPF. As I needed to get the dashboard up and running quickly I used Winforms, mainly because I am very familiar with this particular technology. WPF would also have been a very good choice.
My dashboard was aimed squarely at a technical audience, so the level of detail could be higher. The context of the dashboard was to allow developers spot any problems in our critical systems. I decided to keep the screen design very simple. I used a tabbed control, where each tab represented one sensor from the xml stream written out by the monitoring system.
This article will talk about the deployment of your system monitor to start monitoring production systems. A lot of what I discuss here is from experience, especially as I work in a regulated industry (Financial Services), which threw up a few interesting challenges on how I could achieve rapid deployments and application feedback without breaking any rules.
Deploying into a Production Environment
The goal of your monitoring tool is to regularly check over your critical production systems, so that means you need to deploy into production. If you work in a smaller company that doesn’t have a long and complex deployment process (By ‘complex deployment process’, I mean a process heavily bogged down with paper work a company processes), then this should be straight forward.
When developing monitors you want to be able to iterate quickly, and deploy early and often. If you work in an environment where you can do this and release into production quickly, then great. You need to do, just that. Every time you get a piece of functionality ready for prime time, release it. You just need to make sure that the server you deploy the monitor too can access any log files or databases to gather its information.
The diagram above shows what a typical deployment might look like. You have your monitoring server setup containing the monitor application and the task scheduler. This application then processes logs contained on each target system. The example above has web services, a website, and payment gateway and payment processors. This shows you your most basic type of deployment. This is great if you can do this type of deployment quickly and often.
This is the second article in a series on application systems monitoring for software developers. In the first article I discussed the basic idea and concepts around systems monitoring. In this second article I will go over the basic architecture of how an application monitor can work. The system will start out simple to begin with and be extended over time. This iterative approach to building up the design mirrors how I implemented this type of system for my current employer. In true agile style I wanted to get something basic working as quickly as possible so we could start getting the benefit from it early on. Once the system was out there and working, I then built upon the basic idea with new features.
First of all let’s look at the basic use case diagram above. This diagram shows 2 subsystems, the monitor sensors and the monitor dashboard. This article will focus on the first subsystem, Monitor Sensors. At this stage in the systems evolution the main actors here are Developers and Technical Support.
Roles may differ in your own organisation, but because I work in Financial Services we have to have a clear separation of concerns between the development infrastructure and the production infrastructure. This means developers can not directly deploy too or modify a production environment. This is for good reason; we are very good at breaking stuff!! This is why we and many other large organisations have a separate technical services teams who maintain the production environment.
That means both developers and technical support have a vested interest in the results of the monitor. Just because developers can’t make changes to a production environment, there is no reason why they can’t see the telemetry data collected by the monitoring systems. In fact it would be a very bad idea if they couldn’t. By seeing how the production systems are running directly from a snapshot data stream, developers can get a good impression of how their systems perform in a real production environment. This may well help to influence future design decisions when the systems are extended. This also gives developers a realistic impression of what volumes their systems cope with in real life. As we all know a development test environment, no matter how hard we try, never matches what we see in a real production scenario with real users and data.
This is the first part of a set of articles about systems monitoring for software developers. I am writing about this as it is something I have been working with a fair amount over the last 6 months in my current role here in the UK.
The company I work for is pretty typical for a large firm. We have a mix of different legacy systems that we have to keep running as we try to update them, and integrate new systems. A lot of these systems vary in quality and the majority of the original system developers no longer work for the company. When any of these systems go wrong, it can be quite difficult to diagnose what the problem is.
As an example, we have one particular system that handles file synchronisation between multiple sites that runs over night. This system does write out log files, but they are the most unfriendly log files I have ever seen. For a start they are so large, people do not even bother to look at them anymore. Even if you do open up these files, the formatting is so strange and complicated you really struggle to see what is going on. This again is why people don’t bother to look at them. This particular system is one of the more extreme cases. Other systems operate a mix of writing data to log files or adding huge amounts of data into a logging database. In some cases the amount of data logged is quite extreme, which makes searching of meaningful information in the event of a system failure extremely difficult. Even more so if you are under pressure with a system outage which is having a direct revenue impact to the business.
In the rest of this series, I will talk about the system I designed and developed. I will cover the design and implementation. I will also talk about some of the problems I faced, and how this system has helped divert a few major live incidents. For obvious reasons I can’t discuss actual systems and internal process at my company as I don’t think they would like that too much, so some details have had to be changed or omitted, but that really doesn’t matter for the purpose of this article. The intention is to discuss the architecture of the system. I hope this series will help you in thinking about your monitoring needs in your own organisation.