Defensive design means designing systems with basic assumption that everything that can fail, will eventually fail. Defensive design means implementing features that will cope with common types of failures occurring at all operational levels.
BulkheadBulkhead is a design pattern limiting the scope of failures to particular components, so that errors, failure or damage don’t spread to different part of the system or to different systems. For example, preventing fire spreading from one building to another.
Edge CasesDesigning to cover edge case scenarios is a way of designing systems to covering all rare but possible states or conditions in which the system may be put in or operating. For example, designing a system to operate in a critically low temperatures.
Mistake proofingMistake proofing is designing the systems to handle human error and operator mistakes so that they are impossible to make in the first place. For example: preventing users from uploading data in a wrong format.
DecouplingDecoupling is designing the parts of the system to be independent from each other by making them easily replaceable by different implementations.
RedundancyRedundancy is a design choice allowing duplication of resources or instances so that backup resources or copies of instances can handle workload in case of a failure.
RetryRetry is a design concept encompassing multiple attempts to reach or to connect to external resources in case they become temporarily unavailable, therefore preventing failure of a system due to intermittent error(s).
UndoUndo is a design concept allowing to revert system to a previous position, therefore allowing correction of human mistakes or preventing data corruption.
Cold standbyCold standby is a concept of providing spare resources ready to start when needed, typically acting as backup to their primary resources.
DeratingDerating is a design concept that changes the way system operates so that if mistake is detected the system changes its operation to prevent from things getting worse.
Fault ToleranceFault tolerance is the ability of a system to continue its operation when error is detected, so that the whole system does not halt at the first instance of an error or any exception put in place to prevent unpredictable results.
Graceful degradationGraceful degradation is a concept allowing partial operation of a system in case of a failure so that some functionalities or some areas of the system continue to work if other parts of the system failed.
MonitoringMonitoring is a concept of activities providing workarounds in case of anomalies are detected during system operation so that parts of it can be taken offline and diagnosed as a result of abnormal or suspicious execution.
DurabilityDurability is a concept to design the system so that it can handle wide variety of stress conditions and different levels of workloads.
ResilienceResilience is a concept of designing a system that handles different levels of stress and load due to its intrinsic properties.