Majornetwork

Gateway of last resort is not set

Life of a Change

I’ve been quite busy doing “stuff” the whole winter. Here is one piece of “stuff”: Life of a Change.

T minus 7 weeks

I created the Change Request (CR). It was based on the discussion we had had with the customer and a service provider about developing the customer connections of the SP.

During the years of being a network consultant for the customer I’ve learned to understand the customer business requirements and to suggest projects and changes that benefit the customer. It is also natural for us to drive and steer projects having several third parties involved so I tend to keep in contact with the service providers as well. This CR was one result of my work as an experienced consultant.

In the CR I described the goals of the change. I also estimated the actions that will be executed (more detailed plan would only be created if we got the green light), the impact of the changes and the work estimate. Risks were discussed as well to give the customer enough information to consider the change. I also gave a suggestion of the implementation date and time, based on my experience on the customer business so that the implementation would cause as small disruption as possible for customer business.

T minus 5 weeks

The CR was approved by the Change Advisory Board that consists of people from the customer and our organization. The CAB reviewed the need for the change, risks, costs and benefits, and decided that the change was allowed to proceed. I was informed, and I started the actual planning of the changes.

The SP installed the new connection in the data center as they had got the order from the customer.

T minus 3 weeks

Later I noticed that the suggested implementation date and time was overlapping with the newly announced recurring maintenance window of the SP. The risks of overlapping changes were assessed and finally it was decided that it was best to not overlap the changes. Since the SP maintenance schedule is fixed and affects many of their customers it was decided that our change was to be rescheduled. There was no problem with that as the change was not based on very urgent needs. Due to the Easter time and its business expectations for the customer we pushed the implementation forward three weeks.

(Effectively for this post, the T clock was stopped for three weeks.)

T minus 2 weeks

While planning the change with the SP I also created the communication template for the change: who to inform and what to inform. Informing is a very delicate piece of art as we all know. If you don’t inform enough it raises questions about the change. On the other hand, if you inform too much, it raises questions or concerns because of the details that the recipients didn’t understand or just misunderstood. I formulated the communication template and it was approved by the customer.

T minus 1 week

Servicedesk sent the preliminary announcement email to the delivery list that was agreed earlier, consisting of customer IT contact people, project/service delivery managers and technical teams that may receive questions or may otherwise be affected by the change.

I was finalizing my detailed Change Implementation Plan: in this change it consisted of 40 lines of actions for 2-hour maintenance window, each line consisting of various number of configuration commands for the devices. Earlier this year I was leading a change that had 6-hour maintenance window, six specialists doing the changes (from four different companies; plus the normal night shift servicedesk and IMOC (Infrastructure Management Operations Centre) people) and consisting of 320 lines of actions in CIP (with some 1500 lines of configuration commands). In that sense this change was somewhat smaller deal. I really like having a detailed plan because it enables me to follow a clear path during the change. My reviewing colleagues also get a clear picture of the plan when almost nothing is assumed, everything is written out.

T minus 2 days

I confirmed with the participating SP specialist that everything was still fine and no new questions had arised. I also checked that there were no open questions from the customer, so there was nothing to prevent us from proceeding with the change.

T minus 12 hours

I prepared myself for the following day’s actions: printed the CIP, set wakeup alarm to 02:30 (AM), made some sandwiches in the fridge for a takeaway breakfast, and set my phone to silent mode to ensure some sleep.

T minus 90 minutes

The alarm clock went crazy. Apparently I had had some sleep but it is always a bit lighter when you have an important change coming in the night. I took a quick shower, grabbed the sandwiches and a banana, and drove off to the office complex. The change involved implementing some brand-new connections in the data center so I decided it’s best to be near the data center while making the changes, just to be sure, if anything was needed onsite.

T minus 15 minutes

I asked the Servicedesk to send the predefined starting announcement. I also made some last minute checks in the monitoring systems to get a baseline of the network status, and enabled some specific extra monitoring probes to get more direct information during the changes.

T minus 5 minutes

I joined the conference call with the SP specialists. This time they had two guys doing the change, both of them I had worked with many times before. It was nice to be working with people who you really trust and who are able to adapt to the current situation, whatever it would be, based on their experience. I also knew that their attitude is customer-focused. It is very frustrating to try to fix something with a guy who thinks “it’s not a big deal, maybe we can leave it here” when something is broken. These guys were something totally different.

T plus 0 minutes

We started the changes, and proceeded according to my CIP. I won’t go into details here, it would be too boring. Interfaces, IP addresses, VRFs, BGP sessions, route maps, prefix lists, all those features you usually need when interfacing customer devices with the SP devices.

T plus 60 minutes (or thereabouts)

Pay day: I ate my banana.

T plus 100 minutes

Our 2-hour maintenance window was nearing the end. We had only had one problem: one port of the SP switch did not show packet counters correctly. Traffic was flowing nicely, no problems there. But the reporting clearly did not work as supposed. We had already tried some tricks: changed SFPs, cleared some of the configuration, changed the port. At that point we decided to reboot the SP switch.

T plus 115 minutes

Reboot helped! All the interface counters were reporting realistic numbers, and everything looked fine! We completed everything as planned in the scheduled maintenance window, and didn’t cause any problems for the customer business. We made the last checks, and then stated the change as completed, and closed the conference call but promised to be reachable both ways in case something unexpected appears during the next hours when the new day begins for the customer business.

T plus 135 minutes

I rechecked all the network monitoring consoles and probes to be ok and then called the Servicedesk and asked them to send the final announcement for the change.

T plus 3 hours

I had updated the most important operational documents for our production team to be able to handle the new environment. Time for breakfast.

T plus 4 hours

It was 8 o’clock and workmates were entering the building. Some of them were looking tired (“bah, morning”) but I was full of power: One more successfully completed change in the customer systems.

T plus 1 day

I had finalized the rest of the documentation and written the Post-implementation Review document: the change was closed successfully.

Updated: April 26, 2012 — 18:20

Leave a Reply

Majornetwork.net © Markku Leiniö 2011-2017 Frontier Theme