Saturday, 18 May 2019

Amazon's service-oriented collaboration principles

https://www.theregister.co.uk/2019/05/14/amazons_away_teams/?page=2
  1. Team structure
    • Each of the groups that owns a service has a set of goals and possibly a P&L that represents success. A roadmap is generally in place to meet those goals.
    • The teams are ostensibly autonomous and can make any important decision needed to meet their goals.
    • The "value to the customer" is part of the mission for each team. This codified using content such as mock press releases to ensure developers keep end user needs in mind.
    • As much as possible, teams are kept small, adhering to the two-pizza rule, meaning about six people.
    • Services can be refactored or new services can be spun out to new teams. Teams that don’t work are shut down and the technology they created is distributed to other teams or discarded.
    • New teams often are created to solve urgent, end-to-end problems.
  2. Development process
    • Teams use a shared set of development tools for source code and managing the development pipeline, some offered as shared services. There are many tools and services that are commonly or universally used, but no hard requirements. Every team can do what makes sense to get the job done fast. While this is true, at some point you may have to show with data why you deviated.
    • The DevOps model is fully embraced. Each team performs operational support for its service.
    • Access to most source code is not hard to get. One group can usually quite easily take a look at the source code of another without prior restraint. There are some exceptions.
    • A/B testing and detailed monitoring is widespread and used for almost every aspect of the site and infrastructure. The testing is based on the WebLab service, supported by a team that trains staff on how to make testing statistically significant.
    • Teams do not generally have to worry about the rates of internal use of resources. There is no internal currency changing hands for tracking such usage. Rates of usage internally across services are allocated as part of the budget process and monitored by finance teams who meet periodically with teams to discuss any unusual growth in services and encourage optimisation.
    • Decreasing technical debt is not considered a good reason to do anything unless it has an impact on reaching the goals of the team.
  3. Collaboration practices
    • Changes to one team’s service may be implemented by another team who needs the enhanced capability by what is called an Away Team. This team works on the Home Team’s code to add what it needs according to established engineering standards and then leaves that code in good order to be maintained by the Home Team who owns the service, with help when needed.
    • When an Away Team is not an option because the requestor doesn’t have the ability to implement improvements to the service, this does lead to a management discussion about how to optimise the big picture roadmap. Usually roadmaps are bursting, so accommodating a new request means reshuffling the existing roadmap.
    • If extending a service using an Away Team doesn’t work out for some reason, it is perfectly fine to duplicate and create whatever you need to accelerate your progress. There is no concern about duplication across the platform as long as you have a need that will help you move forward.
    • A team creating a service is given credit when they do something that has a positive downstream impact on other services. Management recognises contributions to the big picture, usually on the P&L of the higher entity.
    • "Bar raisers", Amazon staff who act as independent experts who approve key decisions, often who work on other teams, are used not only for hiring, for which they are widely known, but for high impact decisions for design, customer experience, architecture, and A/B testing. It is possible to go against the recommendation of a bar raiser, but such a move is noted and made visible to higher levels of management.
Amazon’s principles for collaboration are differentiating and avoid some of the problems that routinely come up in other large organisations:
  • Months of begging may be required to get access to another team’s source code.
  • A feature that radiates power by generating revenue with customers or enabling control of key decisions will be kept by a team rather than passed to the natural team.
  • Getting the attention of management to make decisions about refactoring roadmaps may take months. Often such attention does not lead to true collaboration.
  • A team may delay providing help to another until incentives are adjusted.
  • It is not uncommon for local optimisation of the team roadmap to take precedence over work that might be transformational for the business.
Here are some of the positive properties of Amazon’s system that flow from the principles:
  • The Away Team model and generally easy access to source code means that investment can easily cross service boundaries to enhance the power of the entire system of services. Teams with a vision for making their own service more powerful by improving other services are free to execute.
  • Management time needed to resolve collaboration issues and refactor roadmaps is dramatically reduced. In Amazon’s model, roadmaps need to be rebalanced at times, but those events are minimised because successful services fund the Away Teams.
  • Team autonomy also reduces the need for management input. The policy is to do whatever it takes to provide value to the customer and not worry about duplication or deviating from standards. There is no waiting while the perfect shared service is being developed. There is no friction from having to use the perfect shared service.
  • The lack of transfer pricing means that teams are generally focused on making better services, not on keeping track of funny money. Resource usage is tracked by the finance team who looks for spikes and requests explanations or optimisations.
  • The emphasis on data reduces ideological passion. I may be right or you may be right, but we don’t need to fight it out. We will see in the end because the data is always right.
  • The beneficial effects of cross team scrutiny are built in. Your code will be visible. Your decisions may be subject to bar raisers. The code an Away Team writes must be accepted by another team. If you are going to be sloppy, it will become public.
  • Over time, public AWS versions of services tend to be preferred because they are often higher quality and performance than internal services because of the customer obsession of Amazon, and more usage accelerates the improvement process.
The downside of Amazon’s model is that the architecture may contain duplication or many versions of the same service. A common motto is “two is better than none,” meaning do what you need to do. Another balancing motto is “none is better than five,” so don’t get carried away and create a new service each time something doesn’t exactly fit.

No comments:

Post a Comment