Core OKR Q2 scoring

oskarth · June 18, 2018, 8:24am

NOTE: This topic is continuously updated as new information comes to light

Context

At the offsite In Bangkok we collectively decided on a set of a OKRs and their priorities. This can be found here: https://docs.google.com/spreadsheets/d/1BhWKyjkpxhavkqtk9VYB3EHNDIzQtUlkwlg7_lQNAws/edit#gid=0

As we are almost at the end of the quarter, this is a good time to do a quick overview of our OKRs and see where we are at so we can improve, both in execution and planning.

(Mid-term discussion: Core OKR scoring mid-Q2 - Google Documenten)

How are OKRs scored?

On a scale from 0 to 1, where 0 means no progress and 1 means completed. Since these are stretch goals, a score of around 0.7 is the sweet spot.

An additional dimension that is useful to capture is one of confidence. I.e. how likely are we to achieve the goal? From 0 to 100%.

For us, each swarm/team/individual with most context can simply add a suggestion for score with a rationale. We then use rough consensus to get a preliminary score.

Individual KR scores are then aggregated at the objective and then whole team level.

OKRs

Score: 0.50
Confidence: 70%
Comment: Not weighted by priority and not final; P0 and P1 ~0.61 – Oskar

Raw

O0 0.6 / 80%
O1 0.63 / 90%
O2 0.47 / 70%
O3 0.1 / 50%
O4 0.7 / 70%
O5 0.5 / 60%

Messaging is reliable

Note: Pinged Chad and Pedro, likely have most context

Score: 0.6
Confidence: 80%
Comment: (See below)

Send/delivered ratio >99%
Score: 0.7
Confidence: 70%
Comment: Current metrics point to around 98% – Pedro

More than 95% of 20+ people surveyed trust Status for messaging
Score: 0.6
Confidence: 80%
Comment: From June 5th survey, ~75% of 30 users found sending/receiving messages moderately to very reliable. There is bound to be a lag between quality being delivered (which has been increasing) and user opinion in polls. – Pedro

0 message reliability Instabug reports
Score: 0.5
Confidence: 90%
Comment: We’ve had 2-3 Instabug reports relative to chat functionality – Pedro

Beta is launched successfully

Score: 0.63
Confidence: 90%
Comment: (See below)

Note: Pinged Chad and Rachel for product metrics and Adam and Jakub re cluster metrics.

5k daily active users
Score: .5
Confidence: 90%
Comment: Our peak DAU is < 400 users. – Rachel. I’d use a geometric mean for this (10->100 ~ 100->1000) so with that heuristic we are more than 50% there IMO (~10->300 ~ 300->500) – Oskar

More than 80% of users retained 7 days after recovering an account
Score: 0
Confidence: 90%
Comment: ~15% recurring users retained after day 7. Nearly 0% of first-time users retained. – Rachel

Cluster can handle 500 concurrent users
Score: 0.9
Confidence: 95%
Comment: It should already be able to do that, but some stress tests are in order. As before, scaling vertically and horizontally with the current setup is easy. – Jakub

More than 20% of users send a transaction
Score: 0.5
Confidence: 95%
Comment: Last 30 days DTU/DAU x 100 = ~10%

More than 20% open at least 1 Dapp
Score: 1
Confidence: 100%
Comment: Last 30 days DDU/DAU x 100 = 42%

More than 99% cluster ~uptime~ availability
Score: 0.9
Confidence: 90%
Comment: I already commented before on using the word “uptime” which is confusing in this context. Uptime is availability of a specific server, what you mean here is cluster “availability”. And with a 2 DC setup and multiple host of each type our availability prospects are good. Though a 3rd DC would recommended. – Jakub

SNT is a powerful utility in Status

Score: 0.47
Confidence: 70%
Comment: (See below)

2x launched SNT use cases
Score: 0
Confidence: 100%
Comment: 0 launched SNT use cases – Rachel

2x demo’s/proof of concepts using SNT
Score: 0.4
Confidence: 50%
Comment: ENS registration will be on testnet soon. – Rachel

2x Fleshed out description of the utility
Score: 1
Confidence: 100%
Comment: Tribute to Talk, paid mail nodes, usernames and voting DApp all have thorough write-ups. – Rachel

Status is used everyday internally

Note: Pinged Chad

Score: 0.1
Confidence: 50%
Comment: (See below)

80% of core contributors use Status (mobile or desktop) every workday
Score: 0.1
Confidence: 30%
Comment: Not much outside of testing – Chad

10% more usage of Status Desktop than Slack
Score: 0
Confidence: 70%
Comment: Essentially zero right now – Oskar

Performance significantly improves

Note: Pinged Igor

Score: 0.7
Confidence: 70%
Comment: (See below)

Reduce data consumption to <10Mb/day
Score: 0.7
Confidence: 50%
Comment: Need to re-check this one – Igor.

Reduce power consumption to <120% of Telegram/Skype
Score: 0.7
Confidence: 90%
Comment: We are at 2x worse than the goal now, starting at >~600%. – Igor

UI interaction time <100ms
Score: 0.8
Confidence: 60%
Comment: Most of the common issues fixed, some scenarios aren’t great and we have a room for improvements, but the UI is for sure much more responsive. – Igor

Implement continuous delivery

Note: Pinged Jakub, Anton, Igor

Score: 0.5
Confidence: 60%
Comment: (See below)

100% of iOS and Android releases are automated
Score: 0.5
Confidence: 30%
Comment: XCode is a nightmare. I have very little confidence in this. (0) – Jakub. Basic Jenkins jobs to release with Fastlane, changelog manual (0.7) – Chad

More than 80% automated test coverage
Score: 1
Confidence: 100%
Comment: We’ve used 80% of’ Functional tests for nightly build` suite is covered in Testrail – Anton

Get nightly to two sigma reliability
Score: 0
Confidence: 30%
Comment: Replacing artifactory might have a good effect on this, but the rest is up to devs and their merging policy. – Jakub. Testing last 60 builds randomly shows 30% success – Oskar

TODOs

Messaging is reliable
– Send/delivered ratio >99%
– More than 95% of 20+ people surveyed trust Status for messaging
– 0 message reliability Instabug reports
Beta is launched successfully
– 5k daily active users
– More than 80% of users retained 7 days after recovering an account
– Cluster can handle 500 concurrent users
– More than 20% of users send a transaction
– More than 20% open at least 1 Dapp
– More than 99% cluster uptime
SNT is a powerful utility in Status
– 2x launched SNT use cases
– 2x demo’s/proof of concepts using SNT
– 2x Fleshed out description of the utility
Status is used everyday internally
– 80% of core contributors use Status (mobile or desktop) every workday
– 10% more usage of Status Desktop than Slack
Performance significantly improves
– Reduce data consumption to <10Mb/day
– Reduce power consumption to <120% of Telegram/Skype
– UI interaction time <100ms
Implement continuous delivery
– 100% of iOS and Android releases are automated
– More than 80% automated test coverage
– Get nightly to two sigma reliability
All preliminary scores set
Sanity check with everyone
Final Core OKR scores

For Q3 OKRs, please go here: Status.app

If you know something, please edit or comment
If you disagree with a score, please speak up
Uncertainty in scoring is OK. Estimates are fine.

igor · June 18, 2018, 9:14am

My take on it

Performance significantly improves

Score: 7/10
Confidence: 8/10
Comment: XXX

Reduce data consumption to <10Mb/day
Score: 7/10
Confidence: 5/10
Comment: Need to re-check this one.

Reduce power consumption to <120% of Telegram/Skype
Score: 7/10
Confidence: 9/10
Comment: We are at 2x worse than the goal now.

UI interaction time <100ms
Score: 8/10
Confidence: 6/10
Comment: Most of the common issues fixed, some scenarios aren’t great and we have a room for improvements, but the UI is for sure much more responsive.

jakubgs · June 18, 2018, 11:18am

Let’s see:

Beta is launched successfully

5k daily active users
Score: 0.4
Confidence: 95%
Comment: This is just a matter of scaling when we see lack of resources, and that is already trivial in the current setup.

Cluster can handle 500 concurrent users
Score: 0.9
Confidence: 95%
Comment: It should already be able to do that, but some stress tests are in order. As before, scaling vertically and horizontally with the current setup is easy.

More than 99% cluster ~~uptime~~ availability
Score: 0.9
Confidence: 90%
Comment: I already commented before on using the word “uptime” which is confusing in this context. Uptime is availability of a specific server, what you mean here is cluster “availability”. And with a 2 DC setup and multiple host of each type our availability prospects are good. Though a 3rd DC would recommended.

Implement continuous delivery

100% of iOS and Android releases are automated
Score: ?
Confidence: 10%
Comment: XCode is a nightmare. I have very little confidence in this.

More than 80% automated test coverage
Score: ?
Confidence: ?
Comment: I don’t write the tests, so can’t really tell.

Get nightly to two sigma reliability
Score: ?
Confidence: 30%
Comment: Replacing artifactory might have a good effect on this, but the rest is up to devs and their merging policy.

Just a rough estimate.

pedro · June 18, 2018, 3:10pm

Messaging is reliable

Send/delivered ratio >99%
Score: 0.7
Confidence: 70%
Comment: Current metrics point to around 98%

More than 95% of 20+ people surveyed trust Status for messaging
Score: 0.6
Confidence: 80%
Comment: From June 5th survey, ~75% of 30 users found sending/receiving messages moderately to very reliable. There is bound to be a lag between quality being delivered (which has been increasing) and user opinion in polls.

0 message reliability Instabug reports
Score: 0.5
Confidence: 90%
Comment: We’ve had 2-3 Instabug reports relative to chat functionality

rachel · June 18, 2018, 4:27pm

Beta is launched successfully

5k daily active users
Score: .1
Confidence: 90%
Comment: Our peak DAU is < 400 users.

More than 80% of users retained 7 days after recovering an account
Score: 0
Confidence: 90%
Comment: ~15% recurring users retained after day 7. Nearly 0% of first-time users retained.

More than 20% of users send a transaction
Score: .5
Confidence: 95%
Comment: Last 30 days DTU/DAU x 100 = ~10%

More than 20% open at least 1 DApp
Score: 1
Confidence: 100%
Comment: Last 30 days DDU/DAU x 100 = ~42%

SNT is a powerful utility in Status

2x launched SNT use cases
Score: 0
Confidence: 100%
Comment: 0 launched SNT use cases

2x demo’s/proof of concepts using SNT
Score: .5
Confidence: 90%
Comment: ENS registration will be on testnet soon.

2x Fleshed out description of the utility
Score: 1
Confidence: 100%
Comment: Tribute to Talk, paid mail nodes, usernames and voting DApp all have thorough write-ups.

cryptowanderer · June 19, 2018, 2:01pm

My assessment is not qualitatively different from @rachel’s.

I would score the 2x demo’s/proof of concepts using SNT as 0.3 though, as it has technically not been deployed as of today.

Also, can we update the TODOs in the OP @oskarth to reflect this, the first x is in the wrong place.

oskarth · June 19, 2018, 4:37pm

Thanks all!

Updated state. The main thing I changed was DAU as I think this is better reflected with geometric scale rather than with arithmetic (10->100 same as 100->1000).

Still missing better numbers on:

– 80% of core contributors use Status (mobile or desktop) every workday
– 100% of iOS and Android releases are automated
– More than 80% automated test coverage
– Get nightly to two sigma reliability

oskarth · June 25, 2018, 2:31pm

Pinged Anton (automated tests) and Chad (usage and release automation) about preliminary scores. Tested “Get nightly to two sigma reliability” in meantime here: Jenkins: Get nightly to two sigma reliability · Issue #2878 · status-im/status-mobile · GitHub - scoring 0.

While we are waiting for these I’m going to preliminary score them 0 to calculate a basic score. This will likely be updated. EDIT: Got some preliminary scores.

UPDATE:

Preliminary Q2 OKR scores done, still waiting for some numbers.

TLDR
Score: 0.5
Confidence: 70%
Comment: Not weighted by priority and not final; P0 and P1 ~0.61

Anton · June 25, 2018, 2:52pm

More than 80% automated test coverage:

We’ve used 80% of’ Functional tests for nightly build` suite is covered Login - TestRail

We’ve reached that coverage in Q2, so score is 1.
e.g.

oskarth · July 29, 2018, 8:51am

Let’s call the preliminary results final as there was no follow up for a month, and we are well-into Q3.