Problem
We have too big PRs and they are merged too slowly. This leads to general frustration, regressions, harder to revert commits, additional workload to rebase, a longer feedback cycle, etc.
Context
We discussed this during the last Core dev call. See below for notes. This is mostly relevant to Status Core and related swarms, though expertise from other projects that have a solid workflow is much appreciated (cc @iurimatias e.g.).
Additionally, we are practicing trunk-based development https://trunkbaseddevelopment.com/, which means we use things like feature flags to hide non-ready features.
Thereās also a release process that impacts this: release process DOC
As well as a manual QA process: <TODO: Add link cc @anna>
And a automated test process: <TODO: Add link cc @Anton)
Hereās the current contribution/PR guide: https://status.im/docs/contributor_guide.html
Design review process / design ā code process doc: <TODO Add link cc @andmironov @hester)
Hereās the pipeline we use for contributions: Pipeline for QA Ā· GitHub
Metrics
Sample from biggest projects at Status. Spot the difference.
PR stats 2019-02-13
status-react: 30 open PRs. Oldest PR: 111 days. Median PR: 7 days.
status-go: 5 open PRs. Oldest PR: 65 days ago. Median PR: 7 days ago
embark: 4 open PRs. Oldest PR: 12 days. Median PR: 3 days.
nimbus: 1 open PRs. 15 hours ago.
Problem space
- Too long-lived PRs
- Lack of consistent use of feature flags for new flows (add code, not change, toggle at last point)
- Hard to revert commits
- Manual QA bottleneck, especially near release => add code to existing PRs
- Hard to do good review on huge PRs
- Current process doesnāt help dev get things merged due to manual QA rules
- QA lacks resources for manual testing, and automated testing
- Mobile and multiple build targets means PRs and automated testing more complex
Acceptance criteria and design review:
- Implementation doesnāt match design
- developer components and Figma are different
- Figma uses fixes styles / lack of automatic responsiveness
Solution space
- Use flags wherever possible, allowing for feature to be turned on in dev mode, then do final design review
- Remove manual QA as a blocker for PR merging
- Donāt dedicate all QA to release
- QA measuring ātime to test/unblockā as a target to optimize for
- Devs writing automated tests
- QA testing nightly
- Decouple automated tests from PRs
- Figure out way to decrease distance between design and dev components (responsive layout, fiddle, etc)
- Donāt allow WIP PRs
- Reject PRs as ācanāt review, too bigā
- Make it easier to revert commits (no two allow)
- Close PRs older than a week by default
- Measure above until we are better
- Generally culturally enforce it and make expectation that PRs are merged within a day norm
- Devs taking more responsibility to test their own stuff more and liberally reverting bad commits
Notes from Core Dev Call
Better PRs
Oskar: we tend to have too long-lived PRs. Previously we decided to use feature-flags much more liberaly, but it seems we have fallen out-of-practice with that. Andrea P.: In some situations we donāt want to use feature flags. When we near releases, testers are too busy testing the release, which leads to people adding code to existing PRs. Julien: It becomes to do a good review on a huge PR. Pedro: thereās a lot to be said about breaking PRs into smaller ones, since people will be more keen to act on a PR review if they antecipate it will take a short time. If we see that still doesnāt work, we can try to diagnose where the bottleneck is. Andrea P.: The system we have in place doesnāt help the dev. We should relax rules for manual QA, since some PRs which have big impact are delayed before they reach develop. Igor: agree we should relax manual QA rule on small PRs, otherwise weāll start having dozens on PRs in testing and be a bottleneck for QA. Anton: makes sense to have small PRs, since thereās less stuff to update on the tests codebase for each PR. Oskar: taking the example of the wallet redesign PR, why do we replace the UI flow, instead of adding the new one behind a feature flag, allowing QA to decide when to move to the new tests? Iuri: we have a rule in that we donāt allow WIP PRs, with the exception of bounty PRs. If thereās a PR open, the idea is that the PR should be merged the same day, or possibly the next. Otherwise thereās a lot of churn with constant rebasing. Itās important for PR reviewers to be sensible and not ask for changes that are unrelated to the PR at hand (e.g., no asking for bug fixes that were uncovered by the PR work, but unrelated) Oskar: we should strive to make status-react have a cleaner PR backlog like with Embark (oldest open PR is 11 days, with 9 open, even though the repo is more active). Igor/Pedro: how does the Embark QA process look like? It has been an impediment in recent weeks, since with the focus on 0.9.33 release testing, the resources allocated for PR testing have been almost non-existent. Oskar: that doesnāt give enough credit to Embark, we shouldnāt assume thereās nothing to learn because they are too different from status-react. Barry: If anything, itās despite of the tooling, not because of the tooling that we manage to be quick. Oskar: how can we improve testing speed without adding new resources? this is the reason why we dedicated capacity to automated testing; Anton: we donāt have capacity to cover each scenario, so we do it manually. We plan to do it in the future, but the future never comes because there are more and more PRs; Oskar: I have a hard time buying that we need more resources, there must be something else we can do to improve speed; Igor: maybe we can not do all-hands-on-deck for release testing, and then theyāll take longer to complete. There were a couple of situations that required this type of organization to be able to respond in time. Oskar: this is a long topic, does any one else want to contribute to the discussion? Andrea P will start a discuss post about this subject.
Acceptance criteria and design review
Rachel: Iāve been observing a lot of PRs where the implementation doesnāt exactly match the design. Itās a general problem, not specific to one or two PRs. Hester: Eric has done a lot of good work implementing components, so hopefully these issues will be less and less as more things are componentized. Barry: Components as laid out by designers in Figma is different than the way that developers implement them. Figma uses fixed styles. Can we lay things out in Figma using Flexbox or percentages instead? (To-do: ask designers to follow up) Eric: we need a way to sync on what components have been implemented, as Iāve seen components being implemented twice (not necessarilt the same aspects of it). We could do a separate call to sync up (Rachel will set it up).
Next steps
- Continue to discuss solutions space here.
- Make decision about specific things weāll try and who will ensure it happens.
- Follow up next core dev call and possibly TH.
ADDENDUM: The above āproblem spaceā and āsolution spaceā sections arenāt set in stone, they are merely a first attempting at aggregating problems and potential solutions people have pointed out. Some people below have proactively taken these and crystallized them further, separating the wheat from the chaff to understand what the actual problems are and what some solutions might be to these. More of this please! Donāt get too hung up on whatever the above laundry list says. I should have made this clearer in the initial post.