System Monitoring – Part 2 : Basic Architecture

This is the second article in a series on application systems monitoring for software developers. In the first article I discussed the basic idea and concepts around systems monitoring. In this second article I will go over the basic architecture of how an application monitor can work. The system will start out simple to begin with and be extended over time. This iterative approach to building up the design mirrors how I implemented this type of system for my current employer. In true agile style I wanted to get something basic working as quickly as possible so we could start getting the benefit from it early on. Once the system was out there and working, I then built upon the basic idea with new features.

Use Case

Use Case Diagram for a Basic System Monitor.
Use Case Diagram for a Basic System Monitor.

First of all let’s look at the basic use case diagram above. This diagram shows 2 subsystems, the monitor sensors and the monitor dashboard. This article will focus on the first subsystem, Monitor Sensors. At this stage in the systems evolution the main actors here are Developers and Technical Support.

Roles may differ in your own organisation, but because I work in Financial Services we have to have a clear separation of concerns between the development infrastructure and the production infrastructure. This means developers can not directly deploy too or modify a production environment. This is for good reason; we are very good at breaking stuff!! This is why we and many other large organisations have a separate technical services teams who maintain the production environment.

That means both developers and technical support have a vested interest in the results of the monitor. Just because developers can’t make changes to a production environment, there is no reason why they can’t see the telemetry data collected by the monitoring systems. In fact it would be a very bad idea if they couldn’t. By seeing how the production systems are running directly from a snapshot data stream, developers can get a good impression of how their systems perform in a real production environment. This may well help to influence future design decisions when the systems are extended. This also gives developers a realistic impression of what volumes their systems cope with in real life. As we all know a development test environment, no matter how hard we try, never matches what we see in a real production scenario with real users and data.

System Monitoring – Part 1

This is the first part of a set of articles about systems monitoring for software developers. I am writing about this as it is something I have been working with a fair amount over the last 6 months in my current role here in the UK.

The company I work for is pretty typical for a large firm. We have a mix of different legacy systems that we have to keep running as we try to update them, and integrate new systems. A lot of these systems vary in quality and the majority of the original system developers no longer work for the company. When any of these systems go wrong, it can be quite difficult to diagnose what the problem is.

System Monitoring - Programmers Desk
Taken by Duncan Verrall

As an example, we have one particular system that handles file synchronisation between multiple sites that runs over night. This system does write out log files, but they are the most unfriendly log files I have ever seen. For a start they are so large, people do not even bother to look at them anymore. Even if you do open up these files, the formatting is so strange and complicated you really struggle to see what is going on. This again is why people don’t bother to look at them. This particular system is one of the more extreme cases. Other systems operate a mix of writing data to log files or adding huge amounts of data into a logging database. In some cases the amount of data logged is quite extreme, which makes searching of meaningful information in the event of a system failure extremely difficult. Even more so if you are under pressure with a system outage which is having a direct revenue impact to the business.

In the rest of this series, I will talk about the system I designed and developed. I will cover the design and implementation. I will also talk about some of the problems I faced, and how this system has helped divert a few major live incidents. For obvious reasons I can’t discuss actual systems and internal process at my company as I don’t think they would like that too much, so some details have had to be changed or omitted, but that really doesn’t matter for the purpose of this article. The intention is to discuss the architecture of the system. I hope this series will help you in thinking about your monitoring needs in your own organisation.

The Obligatory First Post

Image supplied by http://www.freeimages.co.uk/galleries.htm
Image supplied by http://www.freeimages.co.uk/galleries.htm

Wow, this is the first post!! It feels a little empty around here at the moment. My name is Stephen Haunts and I am a software developer currently working in the financial services industry in the United Kingdom. I have decided to start this site as a way of capturing some of my ideas and sharing them with the world.

I mainly work in C# and .NET. This is because that is what my job requires, so I will tend to bias towards the Microsoft Development Stack when talking about actual software development. I also enjoy working with .NET; it is a fun system to write software with.

I don’t intend to just talk about code. This site covers more of the architectural aspects of enterprise software development. Most of these ideas can sit above the actual language and implementation details.

I have ideas for different series of articles that revolve around ideas that I have been thinking a lot about recently. As a little hint to what is to follow, some of these themes are around applications system monitoring, PCI DSS, Zero Downtime deployment models, Agile/Lean development and software development leadership.

I hope you find the content enjoyable, interesting and thought-provoking. Please subscribe to the blog using the subscription widget to the right of the screen. You can also use the sites RSS feed in your favourite news reader. Also, please comment and contribute to the articles if you feel you have something to say. I don’t expect you to agree with everything I say, but the initial articles serve as the starting point of the subject. They can be extended much further by your contribution to the conversation.

Well, that’s the obligatory first post out-of-the-way. That wasn’t as scary as I thought it would be.

Participate with Coding in the Trenches on Facebook
Participate with Coding in the Trenches on Facebook by Click the button above.