Resiliency Testing (Part-II)

Resiliency Testing (Part-II)

20 Dec 2023

Daman Dev Sood

News & updates from Daman Dev Sood

View Profile

This informal CPD article ‘Resiliency Testing (Part-II)’, was provided by Daman Dev Sood, an International Resilience Trainer & Consultant.

Introduction

Businesses today are facing an ever-increasing number of challenges, from technological disruptions to economic uncertainties. In order to survive and thrive in this dynamic environment, companies need to be resilient and reliable (people, processes, technology, systems etc.) that can withstand unexpected events and adapt to changing conditions.

In Part I, I covered the following:

 - Components of Resiliency

 - Resiliency Testing

  • Importance and benefits
  • Good practices
  • Actors (participants)
  • Sample scenarios
  • Training and development investment

Here is more about Resiliency Testing.

Communication in Resiliency Testing

Communication is very important in Resiliency Testing, like any other management system or a part of it. The following cycle is very useful and takes care of communication related requirements specified in ISO 22301:2019 (clause7):

WHO-WHAT-WHOM-WHEN-HOW

WHO = the author

WHAT = the content

WHOM = the audience/ target/ interested party

WHEN = time/ frequency

HOW = medium

Identifying Interested Parties

An individual or an organisation that can affect your business or can be affected by your business or perceives to be affected by your business, is an Interested Party for you. So, this is simply a replacement for the old phrase ‘stakeholder’ but with a lot of meaning. Identification of your interested parties (for whatever you do or even for part of what you do) is important, but difficult task, not understood easily by the most.

The need is not just to identify them, but also to identify their needs and expectations from you. After that, just identification is not enough, you must be fulfilling those needs and expectations.

Actually, the cycle (Interested Parties Management) is like this:

1.     Identify relevant interested parties

2.     Seek/ identify/ establish their needs & expectations

3.     Develop plans to fulfil those needs & expectations

4.     Act according to those plans (investment of time, money, effort will be needed)

5.     Check effectiveness of these plans and actions (how well are you doing)

6.     Go back to interested parties with results and seek their satisfaction level

The same cycle should be adopted in Resiliency Testing also. In particular for Communication in Resiliency Testing.

Communication points in a Resiliency Testing Program

Here are some communication that you will need to do in Resiliency Testing:

  • About the start of Resiliency Testing Program
  • Benefits of Resiliency Testing
  • Selection of an actor/ interested party
  • Roles, responsibilities, and authorities of various actors/ interested parties
  • Information about upcoming Resiliency Test
  • Start, stop, and intermediate communication for a Resiliency Test
  • Resiliency Testing Report/ Results
  • Thanksgiving for participation in a Resiliency Test
  • Improvements based on Resiliency Testing
  • Needle movement on maturity of the Resiliency Testing Program
Define your Resiliency Testing Metrics Program

Resiliency Testing Metrics Program

Picking up from the last point, to be able to claim maturity, you will need to define your Resiliency Testing Metrics Program, the components of which are shown below:

a)     WHAT will be monitored and measured

b)     HOW will this monitoring, and measurement be done

c)     WHAT analysis will be done on the data collected this way

d)     WHO will conduct this analysis

e)     WHAT will be the frequency of measurement/ data collection and analysis

f)      WHAT reports will be generated

g)     WHAT will be the format of the reports

h)     WHO will receive the reports

i)      HOW will the reports be utilised

Again, this is true for any management system or part of it. However, our focus here is on Resiliency Testing.

Diverse types of Resiliency Tests

There are many types of Resiliency Tests that can be conducted:

1. Endurance Testing: Checks a system's ability to handle sustained loads or stress for an extended period of time.

2. Redundancy Testing: Checks a system's ability to continue functioning when one or more components fail by activating backup systems or components.

3. Disaster Recovery Testing: Evaluates a system's ability to recover from an IT disaster.

4. Chaos Engineering: Involves intentionally introducing failures or disruptions into a system to test its ability to recover.

5. Performance Testing: Measures a system's response time, throughput, and other performance metrics to identify potential bottlenecks and areas for improvement.

6. Security Testing: Evaluates a system's ability to withstand security threats and attacks, such as denial of service (DoS) attacks, intrusion attempts, and malware infections.

7. Configuration Testing: Checks a system's ability to handle different configurations and combinations of software, hardware, and network components.

8. Environmental Testing: Assesses a system's ability to operate in various environmental conditions, such as extreme temperatures, humidity, and altitude.

9. Scalability Testing: Measures a system's ability to handle increased workloads and user demand by adding more resources, such as servers, storage, and network bandwidth.

10. Availability Testing: Evaluates a system's ability to remain available and responsive during planned or unplanned downtime, such as system maintenance, upgrades, or power outages.

11. Recovery Testing: Evaluates a system's ability to recover from failures, errors, or other unexpected events, such as crashes, data corruption, or network outages.

12. Load Testing: Measures a system's ability to handle heavy workloads and traffic, such as multiple users accessing the system simultaneously.

13. Compatibility Testing: Checks a system's ability to work with other software and hardware components, such as different operating systems, web browsers, or database systems.

14. Interoperability Testing: Evaluates a system's ability to communicate and exchange data with other systems and applications, such as web services, APIs, or mobile apps.

15. Configuration Recovery Testing: Verifies a system's ability to recover from configuration errors, such as incorrect settings, missing files, or corrupted data.

16. Fault Injection Testing: Involves intentionally injecting faults or errors into a system to simulate real-world failure scenarios and assess the system's resilience.

17. Incident Response Testing: Evaluates a system's ability to detect, respond, and recover from security incidents, such as data breaches or cyber-attacks.

18. Stress Testing: Measures a system's ability to handle extreme workloads, such as sudden spikes in user traffic or resource demand. (similar to load testing)

19. Compliance Testing: Verifies a system's compliance with industry or government regulations, such as HIPAA, PCI-DSS, or GDPR.

20. Usability Testing: Evaluates a system's ease of use, user interface, and user experience to identify potential issues that may affect user adoption and satisfaction.

21. Network Resiliency Testing: Evaluates a system's ability to function properly under adverse network conditions, such as high latency, packet loss, and network congestion.

22. Recovery Time Testing: Measures a system's recovery time from a failure or disruption, such as the time it takes to restore services and data after a system outage.

23. Recovery Point Testing: Measures the amount of data loss a system can tolerate in the event of a failure or disruption, such as testing the recovery of data to a specific point in time.

24. Disaster Simulation Testing: Simulates different disaster scenarios to assess a system's preparedness and resilience, such as earthquakes, hurricanes, or cyber-attacks.

25. Business Continuity Testing: Evaluates a system's ability to maintain critical business operations during disruptions or disasters, such as ensuring access to data, applications, and services.

26. Geographical Redundancy Testing: Evaluates a system's ability to remain operational in the event of regional disasters or outages by utilizing geographically dispersed infrastructure and data centers.

27. Threat Modelling: Assesses a system's security posture by identifying potential threats and vulnerabilities and developing strategies to mitigate and manage them.

28. Configuration Drift Testing: Evaluates a system's ability to detect and recover from configuration changes that occur over time, such as software updates, patches, or user modifications.

29. Chaos Testing: Similar to chaos engineering, but instead of intentionally causing failures, it simulates unpredictable or random events to test a system's ability to cope with uncertainty.

30. Performance Benchmarking: Compares a system's performance against industry standards or similar systems to identify areas for improvement and optimize system resources.

Many of these are close to IT but can be used for non-IT components also just by replacing ‘system’ with ‘process’ or ‘organisation.’ We will close Part II with some discussions around Methods & Techniques for Resiliency Testing (type and objective/ features of the test).

IT disaster recovery focuses on IT components

Methods & Techniques for Resiliency Tests

I.  Review: Can be performed individually by one or many

II. Tabletop: Check the structure and elements of the plan

III. Call Tree: Specifically focuses on contact details

IV. Walkthrough: Thoroughly discuss the theory of the plan to check that it is usable

V.  Simulation: Use the plan to undertake theoretical response to an incident

VI. IT Disaster Recovery: Focuses on IT components

VII. Work Area Recovery: Checks the building failure recoverability

VIII. Limited rehearsal: Confirm that a recovery procedure or the recovery of a piece of technology works

IX. Live test: Confirm that full recovery of complete activities of the organisation

X. Integrated: Most comprehensive; a mix and match can be achieved:

a.  Building, IT, People, Utilities, Supplies, Information – all have been compromised to an extent

b.  Building, IT, People, Utilities, Supplies, Information – all have been compromised to a great extent

c.  Building, IT, People, Utilities, Supplies, Information – all have been compromised fully

d.  Critical third party fails at the same time

e.  Multiple disasters take place at the same time – flood, building-IT failure, epidemic etc.

f.   Financial fraud

g.  Sickness to death of some staff

h.  Political instability in the major customer region

i.   Sexual harassment

j.   Negative campaign on social media

k.  Ransomware attack

l.   Intruder in the building

m. Power crisis

n.  Terrorist attack

o.  Industry wide test

p.  Multiple interested parties – customers, regulators, general public

The current expectation (specially from the regulators. E.g. by the Bank of England in the UK for its Operational Resilience Program), is to test ‘rare but plausible scenarios’.

Conclusion:

The landscape of business today demands not only resilience but a proactive and comprehensive approach to testing that resilience. As explored in this article, Resiliency Testing is a multifaceted process encompassing communication strategies, metrics programs, and a diverse array of test types.

Understanding and engaging with interested parties, coupled with the meticulous identification of needs and expectations, form the foundation of effective Resiliency Testing. The extensive list of testing types presented here serves as a testament to the evolving challenges businesses face, requiring them to prepare for a myriad of scenarios.

As we navigate this dynamic environment, the emphasis on 'rare but plausible' scenarios, as advocated by regulatory bodies, becomes paramount. The intricate web of methods and techniques outlined underscores the need for organizations to adopt a holistic testing approach.

In Part III, we will delve further into the intricacies of Resiliency Testing, exploring additional dimensions and considerations crucial for enhancing an organization's capacity to not only withstand unforeseen challenges but emerge stronger in their aftermath.

We hope this article was helpful. For more information from Daman Dev Sood, please visit their CPD Member Directory page. Alternatively, you can go to the CPD Industry Hubs for more articles, courses and events relevant to your Continuing Professional Development requirements.


Related Articles

Daman Dev Sood

Daman Dev Sood

For more information from Daman Dev Sood, please visit their CPD Member Directory page. Alternatively please visit the CPD Industry Hubs for more CPD articles, courses and events relevant to your Continuing Professional Development requirements.

Want to learn more?

View Profile

Get industry-related content straight to your inbox

By signing up to our site you are agreeing to our privacy policy