Core OKR Q2 scoring

NOTE: This topic is continuously updated as new information comes to light

Context

At the offsite In Bangkok we collectively decided on a set of a OKRs and their priorities. This can be found here: https://docs.google.com/spreadsheets/d/1BhWKyjkpxhavkqtk9VYB3EHNDIzQtUlkwlg7_lQNAws/edit#gid=0

As we are almost at the end of the quarter, this is a good time to do a quick overview of our OKRs and see where we are at so we can improve, both in execution and planning.

(Mid-term discussion: Core OKR scoring mid-Q2 - Google Documenten)

How are OKRs scored?

On a scale from 0 to 1, where 0 means no progress and 1 means completed. Since these are stretch goals, a score of around 0.7 is the sweet spot.

An additional dimension that is useful to capture is one of confidence. I.e. how likely are we to achieve the goal? From 0 to 100%.

For us, each swarm/team/individual with most context can simply add a suggestion for score with a rationale. We then use rough consensus to get a preliminary score.

Individual KR scores are then aggregated at the objective and then whole team level.

OKRs

Score: 0.50
Confidence: 70%
Comment: Not weighted by priority and not final; P0 and P1 ~0.61 – Oskar

Raw

O0 0.6 / 80%
O1 0.63 / 90%
O2 0.47 / 70%
O3 0.1 / 50%
O4 0.7 / 70%
O5 0.5 / 60%

Messaging is reliable

Note: Pinged Chad and Pedro, likely have most context

Score: 0.6
Confidence: 80%
Comment: (See below)

Send/delivered ratio >99%
Score: 0.7
Confidence: 70%
Comment: Current metrics point to around 98% – Pedro

More than 95% of 20+ people surveyed trust Status for messaging
Score: 0.6
Confidence: 80%
Comment: From June 5th survey, ~75% of 30 users found sending/receiving messages moderately to very reliable. There is bound to be a lag between quality being delivered (which has been increasing) and user opinion in polls. – Pedro

0 message reliability Instabug reports
Score: 0.5
Confidence: 90%
Comment: We’ve had 2-3 Instabug reports relative to chat functionality – Pedro

Beta is launched successfully

Score: 0.63
Confidence: 90%
Comment: (See below)

Note: Pinged Chad and Rachel for product metrics and Adam and Jakub re cluster metrics.

5k daily active users
Score: .5
Confidence: 90%
Comment: Our peak DAU is < 400 users. – Rachel. I’d use a geometric mean for this (10->100 ~ 100->1000) so with that heuristic we are more than 50% there IMO (~10->300 ~ 300->500) – Oskar

More than 80% of users retained 7 days after recovering an account
Score: 0
Confidence: 90%
Comment: ~15% recurring users retained after day 7. Nearly 0% of first-time users retained. – Rachel

Cluster can handle 500 concurrent users
Score: 0.9
Confidence: 95%
Comment: It should already be able to do that, but some stress tests are in order. As before, scaling vertically and horizontally with the current setup is easy. – Jakub

More than 20% of users send a transaction
Score: 0.5
Confidence: 95%
Comment: Last 30 days DTU/DAU x 100 = ~10%

More than 20% open at least 1 Dapp
Score: 1
Confidence: 100%
Comment: Last 30 days DDU/DAU x 100 = 42%

More than 99% cluster ~uptime~ availability
Score: 0.9
Confidence: 90%
Comment: I already commented before on using the word “uptime” which is confusing in this context. Uptime is availability of a specific server, what you mean here is cluster “availability”. And with a 2 DC setup and multiple host of each type our availability prospects are good. Though a 3rd DC would recommended. – Jakub

SNT is a powerful utility in Status

Score: 0.47
Confidence: 70%
Comment: (See below)

2x launched SNT use cases
Score: 0
Confidence: 100%
Comment: 0 launched SNT use cases – Rachel

2x demo’s/proof of concepts using SNT
Score: 0.4
Confidence: 50%
Comment: ENS registration will be on testnet soon. – Rachel

2x Fleshed out description of the utility
Score: 1
Confidence: 100%
Comment: Tribute to Talk, paid mail nodes, usernames and voting DApp all have thorough write-ups. – Rachel

Status is used everyday internally

Note: Pinged Chad

Score: 0.1
Confidence: 50%
Comment: (See below)

80% of core contributors use Status (mobile or desktop) every workday
Score: 0.1
Confidence: 30%
Comment: Not much outside of testing – Chad

10% more usage of Status Desktop than Slack
Score: 0
Confidence: 70%
Comment: Essentially zero right now – Oskar

Performance significantly improves

Note: Pinged Igor

Score: 0.7
Confidence: 70%
Comment: (See below)

Reduce data consumption to <10Mb/day
Score: 0.7
Confidence: 50%
Comment: Need to re-check this one – Igor.

Reduce power consumption to <120% of Telegram/Skype
Score: 0.7
Confidence: 90%
Comment: We are at 2x worse than the goal now, starting at >~600%. – Igor

UI interaction time <100ms
Score: 0.8
Confidence: 60%
Comment: Most of the common issues fixed, some scenarios aren’t great and we have a room for improvements, but the UI is for sure much more responsive. – Igor

Implement continuous delivery

Note: Pinged Jakub, Anton, Igor

Score: 0.5
Confidence: 60%
Comment: (See below)

100% of iOS and Android releases are automated
Score: 0.5
Confidence: 30%
Comment: XCode is a nightmare. I have very little confidence in this. (0) – Jakub. Basic Jenkins jobs to release with Fastlane, changelog manual (0.7) – Chad

More than 80% automated test coverage
Score: 1
Confidence: 100%
Comment: We’ve used 80% of’ Functional tests for nightly build` suite is covered in Testrail – Anton

Get nightly to two sigma reliability
Score: 0
Confidence: 30%
Comment: Replacing artifactory might have a good effect on this, but the rest is up to devs and their merging policy. – Jakub. Testing last 60 builds randomly shows 30% success – Oskar

TODOs

  • Messaging is reliable
    Send/delivered ratio >99%
    More than 95% of 20+ people surveyed trust Status for messaging
    0 message reliability Instabug reports

  • Beta is launched successfully
    5k daily active users
    More than 80% of users retained 7 days after recovering an account
    Cluster can handle 500 concurrent users
    More than 20% of users send a transaction
    More than 20% open at least 1 Dapp
    More than 99% cluster uptime

  • SNT is a powerful utility in Status
    2x launched SNT use cases
    2x demo’s/proof of concepts using SNT
    2x Fleshed out description of the utility

  • Status is used everyday internally
    80% of core contributors use Status (mobile or desktop) every workday
    10% more usage of Status Desktop than Slack

  • Performance significantly improves
    Reduce data consumption to <10Mb/day
    Reduce power consumption to <120% of Telegram/Skype
    UI interaction time <100ms

  • Implement continuous delivery
    100% of iOS and Android releases are automated
    More than 80% automated test coverage
    Get nightly to two sigma reliability

  • All preliminary scores set

  • Sanity check with everyone

  • Final Core OKR scores


For Q3 OKRs, please go here: Status.app

  • If you know something, please edit or comment
  • If you disagree with a score, please speak up
  • Uncertainty in scoring is OK. Estimates are fine.
1 Like

My take on it

Performance significantly improves

Score: 7/10
Confidence: 8/10
Comment: XXX

Reduce data consumption to <10Mb/day
Score: 7/10
Confidence: 5/10
Comment: Need to re-check this one.

Reduce power consumption to <120% of Telegram/Skype
Score: 7/10
Confidence: 9/10
Comment: We are at 2x worse than the goal now.

UI interaction time <100ms
Score: 8/10
Confidence: 6/10
Comment: Most of the common issues fixed, some scenarios aren’t great and we have a room for improvements, but the UI is for sure much more responsive.

1 Like

Let’s see:


Beta is launched successfully

5k daily active users
Score: 0.4
Confidence: 95%
Comment: This is just a matter of scaling when we see lack of resources, and that is already trivial in the current setup.

Cluster can handle 500 concurrent users
Score: 0.9
Confidence: 95%
Comment: It should already be able to do that, but some stress tests are in order. As before, scaling vertically and horizontally with the current setup is easy.

More than 99% cluster uptime availability
Score: 0.9
Confidence: 90%
Comment: I already commented before on using the word “uptime” which is confusing in this context. Uptime is availability of a specific server, what you mean here is cluster “availability”. And with a 2 DC setup and multiple host of each type our availability prospects are good. Though a 3rd DC would recommended.


Implement continuous delivery

100% of iOS and Android releases are automated
Score: ?
Confidence: 10%
Comment: XCode is a nightmare. I have very little confidence in this.

More than 80% automated test coverage
Score: ?
Confidence: ?
Comment: I don’t write the tests, so can’t really tell.

Get nightly to two sigma reliability
Score: ?
Confidence: 30%
Comment: Replacing artifactory might have a good effect on this, but the rest is up to devs and their merging policy.


Just a rough estimate.

1 Like

Messaging is reliable

Send/delivered ratio >99%
Score: 0.7
Confidence: 70%
Comment: Current metrics point to around 98%

More than 95% of 20+ people surveyed trust Status for messaging
Score: 0.6
Confidence: 80%
Comment: From June 5th survey, ~75% of 30 users found sending/receiving messages moderately to very reliable. There is bound to be a lag between quality being delivered (which has been increasing) and user opinion in polls.

0 message reliability Instabug reports
Score: 0.5
Confidence: 90%
Comment: We’ve had 2-3 Instabug reports relative to chat functionality

1 Like

Beta is launched successfully

5k daily active users
Score: .1
Confidence: 90%
Comment: Our peak DAU is < 400 users.

More than 80% of users retained 7 days after recovering an account
Score: 0
Confidence: 90%
Comment: ~15% recurring users retained after day 7. Nearly 0% of first-time users retained.

More than 20% of users send a transaction
Score: .5
Confidence: 95%
Comment: Last 30 days DTU/DAU x 100 = ~10%

More than 20% open at least 1 DApp
Score: 1
Confidence: 100%
Comment: Last 30 days DDU/DAU x 100 = ~42%




SNT is a powerful utility in Status

2x launched SNT use cases
Score: 0
Confidence: 100%
Comment: 0 launched SNT use cases

2x demo’s/proof of concepts using SNT
Score: .5
Confidence: 90%
Comment: ENS registration will be on testnet soon.

2x Fleshed out description of the utility
Score: 1
Confidence: 100%
Comment: Tribute to Talk, paid mail nodes, usernames and voting DApp all have thorough write-ups.

2 Likes

My assessment is not qualitatively different from @rachel’s.

I would score the 2x demo’s/proof of concepts using SNT as 0.3 though, as it has technically not been deployed as of today.

Also, can we update the TODOs in the OP @oskarth to reflect this, the first x is in the wrong place.

Thanks all!

Updated state. The main thing I changed was DAU as I think this is better reflected with geometric scale rather than with arithmetic (10->100 same as 100->1000).

Still missing better numbers on:

  • 80% of core contributors use Status (mobile or desktop) every workday
    100% of iOS and Android releases are automated
    More than 80% automated test coverage
    Get nightly to two sigma reliability

Pinged Anton (automated tests) and Chad (usage and release automation) about preliminary scores. Tested “Get nightly to two sigma reliability” in meantime here: Jenkins: Get nightly to two sigma reliability · Issue #2878 · status-im/status-mobile · GitHub - scoring 0.

While we are waiting for these I’m going to preliminary score them 0 to calculate a basic score. This will likely be updated. EDIT: Got some preliminary scores.

UPDATE:

Preliminary Q2 OKR scores done, still waiting for some numbers.

TLDR
Score: 0.5
Confidence: 70%
Comment: Not weighted by priority and not final; P0 and P1 ~0.61

  • More than 80% automated test coverage:

We’ve used 80% of’ Functional tests for nightly build` suite is covered Login - TestRail

We’ve reached that coverage in Q2, so score is 1.
e.g.

Let’s call the preliminary results final as there was no follow up for a month, and we are well-into Q3.