Measuring performance variabillity of EC2 Базы данных и системы хранения
Henrik Ingo – архитектор решений MongoDB, проживающий недалеко от Хельсинки (Финляндия). Специализируется на повышении производительности MongoDB и обеспечении высокой доступности, а иногда и на преобразовании XML-документов в JSON.
До MongoDB в течение многих лет работал в мире MySQL и LAMP с MySQL, MariaDB, Drizzle, Percona, WebScaleSQL, MySQL Cluster и Galera Cluster. Он также является контрибьютором в ядро Drupal 7.
Автор книги «Открытая жизнь: философия Open Source» (англ. "Open Life: The Philosophy of Open Source").
Henrik Ingo has worked 4½ years at MongoDB. Initially as a Solutions Architect working with European customers, he's now in R&D on the performance team. Prior to MongoDB he was active - and an activist - in the MySQL space, and employed by MySQL (MySQL NDB Cluster team), MariaDB and Galera Cluster. Henrik is the author of the book "Open Life: The Philosophy of Open Source"
Working in the MongoDB Server Performance Testing team, we use Amazon EC2 for system level testing. This allows us to flexibly deploy and tear down MongoDB clusters of various topologies, day after day. On the other hand, using a public cloud for performance testing can be challenging for repeatability of test results - to put it mildly. We therefore ended up spending several months just benchmarking EC2 itself. We compared combinations of different instance types and disks (ephemeral SSD vs PIOPS EBS). In the end we found that the largest impact in reducing variability came from the same configuration options that we use on physical HW as well: turning off hyperthreading, using numactl and turning off CPU power saving states. Thus, you could argue that blaming "the cloud" for our performance trouble was wrong. It's possible to get similar performance characteristics from EC2 as physical hardware, when used correctly, and when used incorrectly, both physical and cloud hardware will perform poorly.
With the new configuration we've been able to greatly lower variability of our daily performance tests, and increase trust in the test results. For WiredTiger tests even the worst case is less than 10% min-max range, and MMAPv1 is close to that. We consider this to be below the threshold of performance change that most end users are able to observe anyway, hence it is sufficient for our performance testing purposes.
The results also emphasized a golden rule of performance engineering: measure everything, assume nothing. It turned out the configuration, that was originally used for our performance testing, actually had the worst variability of all configurations we tested!