AliExpress, Maintaining High Availability and High Performance Globally
Dongbai Guo is an expert in Internet-scale software architecture, with 15 years of experiences in managing large scale software development projects across a wide spectrum of applications including cross border e-commerce, medical imaging, healthcare, and digital entertainment, 12 years of experiences in large scale software architecture covering domain models, logical models, physical models, semantic models and implementation architectures, and 11 years of experiences in developing international standards and setting up standard strategies for global enterprises.
Dongbai Guo received a PhD degree in engineering in 2001, MSc in Mechanical Engineering in 1997, MSc in computer science in 1998. Dongbai worked for Oracle USA (00~10), Microsoft USA (10~12), Amazon.com (12~14) and Alibaba Inc (14~now), responsible for software architecture, engineering development and standard development.
I was an adjunct faculty for Brown University, Northwestern University of China and have given presentations at over 70 conferences.


AliExpress is the largest cross border trading platform in the world. Every day, from over 200 countries, millions of consumers visit AliExpress to order products from global merchants. To keep AliExpress operating at high performance at all times is a significant technical challenge. In the past few years, we have refined our best practices for high availability and high performance. In this session, we will explain both the process and some of our findings.

For high availability, we will share some high profile failure cases, we will explain the root of a failure and why failures are unavoidable for an Internet business. We analyze the cost of a failure and the cost of avoiding a failure. From these analyses, we illustrate the necessity of failure management, that is, the effective management of failure is the most economical way of maintaining the growth of an Internet business yet keeping the operational budget at a reasonable level. From this conclusion, we explain the best practices of failure management: risk identification, risk detection, failure detection, monitoring and alarm, and loss minimization. Furthermore, from deep analysis of thousands of failure cases, AliExpress has built a set of guiding principles for failure management. These principles have been safeguarding the growth of AliExpress in the past five years.

For performance management, we introduce the concept of performance loss, measuring how Website performance can affect the business result and explain how it can be measured in realtime with Big Data infrastructure. We will explain various tactics we have employed to increase the performance of an Internet business spanning multiple continents. In 2015, AliExpress has achieved 50% performance increase globally by applying these tactics. It illustrates how engineering innovations can lead to significant business results.

