czwartek, maja 22, 2014

VAT - Validation of Architecture in Tests

1. Flood test / capacity test
We call component with assumed load, see distribution of response times (min, max, avg, med, 80 percentile, 95 percentile, 99 percentile) - are they deterministic. We increase load linearly up to 10x (one order of magnitude higher) and monitor performance. Components should be able to be tuned to handle load 2x higher than current business requirement. Tests should define limits when below application is still responsive and above crashes.
2 Duplicates test
We call services with duplicated messages and check handling of duplicates. Application may not recognize duplicates, may throw errors due to detection of repeating unique ids and may accept duplicates, but internally skip processing of them. Tests should check for not aimed data multiplication and stalled flows due to rejected duplicates errors.
3 Timeout test
We inject long sleep activities and check timeout handling. We look for snowball effect (backlog aggregation due to repeating on timeout), not defined capacity limits causing whole components/flows to be stucked/failing with OutOfMemory errors, overloaded components and servers.
4 Kill & restart test
We kill and restart whole application/(application) servers. We look for missing (lost) data, duplicates and other inconsistencies. We verify transactional requirements and actual implementation.
5 Error injection and negative path
We inject errors inside process execution and check for missing error handlers. We verify error codes to see if they match actual exception. We also verify assigned error categories (business, technical, repeatable, non-repeatable).
6 Logging test
We check if logging works, payloads are available and we have enough file/database storage. We check if flows are traceable using log frontend.
7. Active-Active, Active-Standby
We kill components and check switching between endpoints. We measure times and verify against configured application timeouts. We detect weak points / single points of failure, recovery errors. It is important to test all possible permutations, with various timings etc. We should create and execute test case when whole HA infrastructure does not deliver working HA for connected applications.
8. Poller test
We run many pollers at once and check if they are protected against working in multithreading. We configure limits to comply with singleton design or implement missing (b)locking. We check data consistency/duplication.
9  Data republication
We check if data republication is possible, how to trigger it, how to detect republication from external systems, how to protect against flood/snowball effect/not controllable backlog.
10  Long term stability test
We execute normal load tests for 24/48/72 hours and check for any stability issues caused by daily IT usage patterns.

0 komentarze: