You are using an unsupported browser. Please update your browser to the latest version on or before July 31, 2020.
close
You are viewing the article in preview mode. It is not live at the moment.
Home > General Quest Information > 10/3/2022 Quest server outage - after action report
10/3/2022 Quest server outage - after action report
print icon

Quest users,

 

Now that the Quest server is up and running again, I want to take a moment to explain what happened and what mitigation steps we plan on taking to reduce the chance of something like this happening again during a regular business day. First though, I want to again take this opportunity to apologize for the inconvenience this has caused everyone. We take our responsibility hosting Quest for you very seriously. This is the first time in over 20 years that we've had an issue like this happen and hopefully it will be the last.

 

CAUSE:

G&W has purchased a new server for Quest and hired a IBM Certified Specialist Systems Engineer to perform the migration and install.  We have not installed that server yet - that is planned for the weekend of October 14th as mentioned in a previous email.  In preparation for that installation however, we need to apply patches/updates to the operating system of our current server.  These updates cannot be installed all at once as some are required before others can be installed.  We've been doing them in batches and, to limit downtime on our server, they are set to be automatically installed immediately after our weekly backup process on Monday morning.

 

Unfortunately, one of these patches prevented our server from restarting. We immediately contacted technical support early this morning, when too much time had elapsed for a typical update to be applied and it became apparent the server was having issues. Between us and the technician, we tried multiple ways to bypass the error and restart the server, but nothing worked.

 

With IBM's help, after literally hours on the phone with them, we were able to start the system from a DVD and restore the OS from our backup tape to where it was prior to the updates being applied. (On a side note, they were very appreciative of our regular backups as they have many clients who never do that and we would have been down much longer without it.) Once that was done, we were able to restart the server normally and restart Quest.

 

This whole situation was very unusual as these updates/patches are usually very routine.

 

NEXT STEPS:

Unfortunately, we still need to apply those operating system updates before the weekend of the 14th. This may require us to do several "bounces" of the server outside of our regularly scheduled backups. If we do need to bounce the system, it will be in the evenings around 10pm ET and we will send out an email notification to this group in advance so that people will know.

 

FUTURE MITIGATION STEPS:

For most of the time we've hosted, we've scheduled our regular backups to occur early Monday morning (3 am ET, 2 am CT). Going forward, we are going to schedule the backups for the same time on Sunday mornings instead. That way, if any issues occur, they can hopefully be resolved during the day Sunday, impacting as few people as possible.

 

Again, we're sorry for the inconvenience this has caused.

 

Take care,

 

Bill Gottlieb

Gottlieb & Wertz, Inc.

Feedback
0 out of 0 found this helpful

scroll to top icon