While we hope that it never occurs, a disaster or emergency can happen unexpectedly. Establishing a disaster or emergency recovery plan is an important element of a standard operating procedure (“SOP”).
Emergency Contact List
In the event of an emergency, it’s important to know where emergency contact information is and to follow the established notification procedures:
- Emergency Contact #1 (phone # and/or email)
- Emergency Contact #2 (phone # and/or email)
- Emergency Contact #N (phone # and/or email)
- Continuous Monitoring (CM) System Help Desk: Phone #, Email, Website – As applicable, contact your CM system manufacturer for assistance.
Recovery Procedure Guideline
Your CM system manufacturer should have draft procedures on how to recover from a disaster or emergency event. A failure can be isolated to a single device, or it can be a system-wide outage that impacts the entire system. Depending on the failure mode, the general steps to recover from a failure are outlined in Table A.
Table A – General Recovery Steps
|Step #||Description||Notes & Comments|
|1||Secure Vital Assets||The first step in an emergency is to secure vital assets. The procedure may include verifying the status of storage appliances and/or moving assets to emergency storage appliances or locations.|
|2||Identify Failure Mode(s)||The first step in the recovery procedure is to identify the failure mode. Examples of common failure modes include:
(1) Power Loss
(2) Network Outage
(3) Server or Host Failure
(4) Device and/or Probe Failure
|3||Replace Failed Device(s)||As applicable, repair and/or replace any damaged or failed devices.|
|4||Data Restoration||Depending on system capabilities, any gaps in data may be restorable. Sensors and/or system devices often have on-board memory to temporarily store data in the event data cannot be stored in the central database.
Restoration of data can be automatic, or it may require following the manufacturer’s data restoration procedures.
In certain cases, data may not be fully restorable, and applicable documentation and corrective action follow-up is required.
|5||Verify Normal Operations||Once the system is back up and running, it is important to verify that the system is operating normally and can be relied on to monitor and safeguard the assets.
Follow the manufacturer’s guidelines on how to verify Normal Operations. In certain cases, it may be necessary to perform an IQ/OQ/(PQ) validation.
|6||Perform Select Alarm Checks||After it has been verified that the system is up and running normally, it is advisable to perform select alarm checks to verify that alarms are functioning normally.
Alarm checks are typically a standard part of an IQ/OQ/(PQ) protocol.
|7||Verify Alarm Notification Protocol||Verify alarm notification protocols: All expected alerts should be received (no missing alerts.), in the desired escalation sequence, and with the correct alarm notification information / content.
Verification of alarm notification protocols are usually a standard part of an IQ/OQ/(PQ) protocol.
“( )” Designates an optional protocol
Redundancy & Automatic Failover
Device redundancy and automatic failover may already be designed into your CM system. If this is the case, there may not be any lost data or discontinuation of the operating of your system. Follow the manufacturer’s guidelines on how to “reset” or “reconfigure” the system to address any failed devices and/or components.
Establishing a disaster or emergency recovery plan should be a standard component of your SOP. Periodic testing of recovery procedures is also a good practice – consult with your CM manufacturer for recommendations and guidelines. Don’t wait until an emergency or disaster occurs!
Let’s talk monitoring!
Talk with one of our experts to get your questions answered and see how we can help you solve your continuous monitoring pain points.