PSIM, System Availability - Network Harbor

One area of any comprehensive security solution that is nonetheless largely outside the purview of Network Harbor software is in regards to the reliability and fault tolerance of the infrastructure upon which the integrated security system operates. This infrastructure includes the computing hardware that all of the security software runs upon, the network used for communications, electrical power, necessary non-security software/hardware systems such as databases, etc.

Broadly speaking, the methods of ensuring system availability fall into three broad categories which often overlap in different portions of a system in practice:

1) Recovery - The ability of a system to be returned to nominal operation given manual intervention.

Options under the category of recovery comprise items such as offline virtual machine backups, restorable hardware images for servers and workstations, backup networking hardware, readily available spare card readers and door strikes, and sufficient documentation to allow these replacements to be brought to bear. At the very worst case, simply having sufficient information and documentation to recreate a system from components if necessary, is a form of recovery. In these cases, there is a cost in having spare hardware around, as well as in ensuring that backups and images are kept up to date, and documentation is maintained.

2) High Availability - The ability of a system to tolerate hardware failure and automatically recover.

Options such as the High Availability or Failover mechanisms of popular Virtual Machine infrastructures allow that in the event of a virtual machine crashing due to hardware failure on the host or otherwise, another copy of the VM will be initialized on another host. Network infrastructure that can route around failures automatically would fit in this category in cases where a network connection would drop but be automatically re-established.

3) Fault Tolerance - The ability of a system to continue operations with no interruption due to hardware failure.

Fault Tolerant mechanisms or versions of various Virtual Machine implementations can simultaneously run two instances of a virtual machine on two separate hardware hosts - providing that if one of the hosts goes down, the other VM will continue running, with no interruption of service whatsoever. Hardware solutions involving hard drives in a RAID that can tolerate disk failure, networks that route around issues without dropping connections, as well as redundant computer power supplies, uninterruptible power supplies, generators, etc. may fall into this category.

These types are by no means exclusive - a RAID array for instance, can protect only against hard drive failure, and can do nothing about data loss by malware, physical events such as fire, or hardware issues on the server/workstation other than hard drive failure. In this case, some form of backup for recovery is indicated.

For many components of a security solution, levels of availability assurance greater than recovery are simply not possible. There is no useful mechanism to having a malfunctioning relay continue to operate despite being broken, nor is there a way for it to repair itself or be automatically replaced. Thus, recovery from this failure is the best that can be done. How quickly this recovery is performed depends on how quickly the failure is discovered (monitored vs. unmonitored relays), and how long it takes to resolve (availability of replacement parts and service personnel).

Each component of the comprehensive availability scheme for a system has an associated benefit and cost. The cost for availability failures vary wildly in different environments, and even between systems or sensors in the same environment. For this reason it is quite important that system integrators and their customers discuss availability concerns and requirements early in the systems design process, to ensure that the underlying infrastructure reliability meets the needs of the customer.

Resources

Contact us for your Security needs by calling us at 309-633-9118 or fill out our contact form

System Availability

A Comprehensive PSIM+ Provider

Resources

Contact us for your Security needs by calling us at 309-633-9118 or fill out our contact form