When you are working in the real world (especially on enterprise software) you will find yourself having to support and enhance an older code base. These code bases can vary quite considerably in quality. In the worst case you have legacy code that contains no unit tests. When you need to maintain and enhance this code you really should try to get some tests wrapped around the code, but this is easier said than done.
The code may contain lots of hard coded dependencies to objects that makes adding in clean, isolated unit tests difficult. These hard coded dependencies may access the file system, make database calls or access any other external resources making writing isolated tests difficult.
What do I mean by a hard coded dependency? Well, take a look at the following simple example.
public class ExampleClass
public string GetText
return "Hello, I am a hard dependency.";
public class MyProgram
public void DoSomething()
ExampleClass example = new ExampleClass();
static void Main()
MyProgram program = new MyProgram();
In this simple example we have a class called ExampleClass. This class has a property that returns a string. The class MyProgram has a method called DoSomething() that creates an instance of ExampleClass and calls the property to display the returned string. You may be thinking that nothing untoward is happening here, but what you see here is an example of a hard dependency, or coupling, between MyProgram and ExampleClass. Lets imagine ExampleClass is doing something much more complicated like reading data from a database. If you try to write a unit test to cover the functionality of the DoSomething() method in MyProgram then that unit test will be making a database call. This is not a unit test. It may run fine on your development PC, but when you try to run these unit tests on your build server, it will also try to make a call to the database. A build server shouldn’t have access to an applications database. Also, what if that call to the database changes the state of the data, so that the next time you run the test, the data has changed, so the test fails. This is not a good situation as you now not only have tight coupling in your code, but you are also coupled to the state of the data in your database.
I was reading an interesting article on The Register today called “What Compsci textbooks don’t tell you: Real world code sucks”. Whilst the title of the post is quite brash, I did find myself nodding my head at pretty much everything in the post. As a Lead Developer at a financial services company, code quality is something that bothers me constantly, both in the legacy systems I have inherited and the new systems we are developing. I thought I would add comment to one of the points in the original post.
“A tremendous amount of source code written for real applications is not merely less perfect than the simple examples seen in school — it’s outright terrible by any number of measures. Due to bad design, sloppy or opaque coding practices, non-scalability, and layers of ugly “temporary” patches, it’s often difficult to maintain, harder still to modify or upgrade, painful or impossible for a new person joining the dev team to understand, or (a different kind of problem) slow and inefficient. In short, a mess.
Of course there are many exceptions, but they’re just that: exceptions. In my experience, software is, almost as a rule, bad in one way or another.”
I feel like I have been living this every day over the past 18 months. I have inherited some systems that are very challenging to work with. They contain code that is very hard to maintain, doesn’t scale and has virtually no unit test coverage. Pretty much all of the original authors have now all moved on, but that still leaves a waste ground of code for my team to maintain.
I now have a decent team working under me, and they are trying as best as they can to tame this huge legacy beast, and they are doing a pretty good job with what they have got, but we still find ourselves having to make compromises that make us uneasy due to time pressures, shifting priorities and lots more reason.
This article will talk about the deployment of your system monitor to start monitoring production systems. A lot of what I discuss here is from experience, especially as I work in a regulated industry (Financial Services), which threw up a few interesting challenges on how I could achieve rapid deployments and application feedback without breaking any rules.
Deploying into a Production Environment
The goal of your monitoring tool is to regularly check over your critical production systems, so that means you need to deploy into production. If you work in a smaller company that doesn’t have a long and complex deployment process (By ‘complex deployment process’, I mean a process heavily bogged down with paper work a company processes), then this should be straight forward.
When developing monitors you want to be able to iterate quickly, and deploy early and often. If you work in an environment where you can do this and release into production quickly, then great. You need to do, just that. Every time you get a piece of functionality ready for prime time, release it. You just need to make sure that the server you deploy the monitor too can access any log files or databases to gather its information.
The diagram above shows what a typical deployment might look like. You have your monitoring server setup containing the monitor application and the task scheduler. This application then processes logs contained on each target system. The example above has web services, a website, and payment gateway and payment processors. This shows you your most basic type of deployment. This is great if you can do this type of deployment quickly and often.
This is the second article in a series on application systems monitoring for software developers. In the first article I discussed the basic idea and concepts around systems monitoring. In this second article I will go over the basic architecture of how an application monitor can work. The system will start out simple to begin with and be extended over time. This iterative approach to building up the design mirrors how I implemented this type of system for my current employer. In true agile style I wanted to get something basic working as quickly as possible so we could start getting the benefit from it early on. Once the system was out there and working, I then built upon the basic idea with new features.
First of all let’s look at the basic use case diagram above. This diagram shows 2 subsystems, the monitor sensors and the monitor dashboard. This article will focus on the first subsystem, Monitor Sensors. At this stage in the systems evolution the main actors here are Developers and Technical Support.
Roles may differ in your own organisation, but because I work in Financial Services we have to have a clear separation of concerns between the development infrastructure and the production infrastructure. This means developers can not directly deploy too or modify a production environment. This is for good reason; we are very good at breaking stuff!! This is why we and many other large organisations have a separate technical services teams who maintain the production environment.
That means both developers and technical support have a vested interest in the results of the monitor. Just because developers can’t make changes to a production environment, there is no reason why they can’t see the telemetry data collected by the monitoring systems. In fact it would be a very bad idea if they couldn’t. By seeing how the production systems are running directly from a snapshot data stream, developers can get a good impression of how their systems perform in a real production environment. This may well help to influence future design decisions when the systems are extended. This also gives developers a realistic impression of what volumes their systems cope with in real life. As we all know a development test environment, no matter how hard we try, never matches what we see in a real production scenario with real users and data.
This is the first part of a set of articles about systems monitoring for software developers. I am writing about this as it is something I have been working with a fair amount over the last 6 months in my current role here in the UK.
The company I work for is pretty typical for a large firm. We have a mix of different legacy systems that we have to keep running as we try to update them, and integrate new systems. A lot of these systems vary in quality and the majority of the original system developers no longer work for the company. When any of these systems go wrong, it can be quite difficult to diagnose what the problem is.
As an example, we have one particular system that handles file synchronisation between multiple sites that runs over night. This system does write out log files, but they are the most unfriendly log files I have ever seen. For a start they are so large, people do not even bother to look at them anymore. Even if you do open up these files, the formatting is so strange and complicated you really struggle to see what is going on. This again is why people don’t bother to look at them. This particular system is one of the more extreme cases. Other systems operate a mix of writing data to log files or adding huge amounts of data into a logging database. In some cases the amount of data logged is quite extreme, which makes searching of meaningful information in the event of a system failure extremely difficult. Even more so if you are under pressure with a system outage which is having a direct revenue impact to the business.
In the rest of this series, I will talk about the system I designed and developed. I will cover the design and implementation. I will also talk about some of the problems I faced, and how this system has helped divert a few major live incidents. For obvious reasons I can’t discuss actual systems and internal process at my company as I don’t think they would like that too much, so some details have had to be changed or omitted, but that really doesn’t matter for the purpose of this article. The intention is to discuss the architecture of the system. I hope this series will help you in thinking about your monitoring needs in your own organisation.
Wow, this is the first post!! It feels a little empty around here at the moment. My name is Stephen Haunts and I am a software developer currently working in the financial services industry in the United Kingdom. I have decided to start this site as a way of capturing some of my ideas and sharing them with the world.
I mainly work in C# and .NET. This is because that is what my job requires, so I will tend to bias towards the Microsoft Development Stack when talking about actual software development. I also enjoy working with .NET; it is a fun system to write software with.
I don’t intend to just talk about code. This site covers more of the architectural aspects of enterprise software development. Most of these ideas can sit above the actual language and implementation details.
I have ideas for different series of articles that revolve around ideas that I have been thinking a lot about recently. As a little hint to what is to follow, some of these themes are around applications system monitoring, PCI DSS, Zero Downtime deployment models, Agile/Lean development and software development leadership.
I hope you find the content enjoyable, interesting and thought-provoking. Please subscribe to the blog using the subscription widget to the right of the screen. You can also use the sites RSS feed in your favourite news reader. Also, please comment and contribute to the articles if you feel you have something to say. I don’t expect you to agree with everything I say, but the initial articles serve as the starting point of the subject. They can be extended much further by your contribution to the conversation.
Well, that’s the obligatory first post out-of-the-way. That wasn’t as scary as I thought it would be.