Migrating functionality between large-scale production systems seamlessly
Lessons from Uber’s migration of its large and complex systems to a new production environment:
- Incorporate shadowing to forward production traffic to the new system for observation, making sure there would be no regressions. This lets you gather performance stats as well.
- Use this opportunity to settle any technical debt incurred over the years, so the team can move faster in the future and your productivity rises.
- Carry out validation on a trial and error basis. Don’t assume it will be a one-time effort and plan for multiple iterations before you get it right.
- Have a data analyst in your team to find issues early, especially if your system involves payments.
- Once confident in your validation metrics, you can roll out to production. Uber chose to start with a test plan with a couple of employees dedicated to testing various success and failure cases, followed by a rollout to all Uber employees, and finally incremental rollout to cohorts of external users.
- Push for a quick final migration, as options for a rollback are often misused, preventing complete migration.
Full post here, 6 mins read