Status app privacy policy refresh

ceri · June 15, 2020, 11:50am

It’s high time our app privacy policy got a refresh to make it less legalese, and more human-readable. @hester put together an initial brief and @andre & I had a call today with an external advisor to run through the technical details.

Notes of call

Important distinction for privacy law purposes to be made between being a data Controller and a data Processor:
- If a Controller, you are in control of the data and you can do interesting things with data - this leads to more stringent requirements for privacy policies and the compliance exposure for getting it wrong is higher.
- If a Processor - your role must be very passive and you must not do much with the data, there are less legal requirements. One of the key tests is: are you willing to delete the information of the user if requested?
Messaging - when a message leaves your phone, it leaves totally encrypted - we don’t know who it’s from and who it goes to - when it ends up on our servers, starts a timer of 2 days, then the message gets deleted regardless. It’s garbled text to us, we can’t read it, no metadata. Even if encrypted - still considered personal data and liable to GDPR. We cannot retroactively delete messages upon user request because we can’t - if you decide you don’t want us to have your data, your messages will be on people’s phones, we don’t store the data. We received it and they remember it.
- Example: Alice knows Bob’s contact key and vice versa, Alice sends message to Bob, her phone encrypts that. Her phone is connected to peers (think BitTorrent - similar technology, we bounce it off different peers to get a specific route). That message ends up in one of our mail servers, it’s like a glorified text message storage system. remembers time message came in, has topic, random collection of letters/ numbers, no identifying info at all. Bob comes online, receives messages from peers back and forth, topic that both Bob & Alice interested in, they can subscribe to same topic, message arrives to Bob <> Alice. If Bob offline, message stays in mail server for 2 days, once Bob online - Status talks to mail server and asks has he received messages then looks them up. Bob has to have Alice’s key to decrypt them, otherwise garbage. After 2 days, we don’t want to retain data, we delete the messages. So if Bob’s phone off for a long period of time, he will never receive message and it gets deleted.
- Confirmed: for messaging purposes, Status has a data Processor role.
Wallet - when someone places a transaction with their wallet, we can’t delete it. But we are not handling that data ourselves - it lives on the blockchain.
Web browser - allows users to browse distribute apps (special websites that interact with the blockchain) through your wallet. Wallet is your identity. If I looked at your wallet, I could see your address but that has zero identifying info to you, but it is your window to the blockchain world. Can you search for info? Yes. Does Status keep a trail of search terms that user enters? We don’t process anything with regards to the browser on our servers. Your phone remembers the last N websites you visited. When someone searches something with the browser they go to Google. When they go to a DApp, they go to IPFS (decentralised file storage system). When they interact with the blockchain, the interaction is directly between them and the blockchain provider. The data never crosses through Status at all. It’s a connection versus us processing the data.
Is there any personal data that Status would process, analyse, store or need for any purpose? Definition of personal data is wide, including dynamic IP addresses and encrypted data. We don’t do any data analysis. We are aware of IP addresses, we use those to determine how many people are online on our platform at any given time, we don’t receive that info in a way that is traceable back to someone’s wallet addresses or chat key.
What would be considered a data Controller role? Something like a referral program where Alice can invite Bob to install Status, when Bob installs, he and Alice both get an incentive. If IP addresses were tracked and remembered for any period (a temporary cache key) this would constitute a Controller role.

Next steps

Advisors will map out any Controller v Process role Status has and how the PP should be structured
Draft privacy policy.

cc @carl

hester · June 16, 2020, 8:17am

@cammellos @andre I’d like to understand if rate limiting by IP address puts us in the controller category. Are IP addresses, stored, for what time?

cammellos · June 16, 2020, 8:32am

As a note, currently we don’t keep messages for 2 days, it’s 30 I believe.

2 days is too short to give users a good UX, as for example people might not use the app during weekends if used for work. 7/14 days seems more appropriate.

For IP addresses, I believe we don’t clean them up, so they are stored until node restart.
There’s no reason though we should be using the IP addresses in clear text, we can just Hash(IPAddress) so that no info is stored.

hester · June 16, 2020, 8:48am

Thanks I actually thought 2 weeks, which is what we reflect in FAQ. How can this be verified? By whom?

There’s no reason though we should be using the IP addresses in clear text, we can just Hash(IPAddress) so that no info is stored.

This would be great. Can this be part of ongoing work on rate limiting or do you need me to create and plan an issue? Ideally we can give guarantees on how this works within the next two weeks. Implementation would need to be in line with when Privacy Policy is published which I anticipate to be along release 1.6, roughly end of July

cammellos · June 16, 2020, 8:54am

Thanks I actually thought 2 weeks, which is what we reflect in FAQ. How can this be verified? By whom?

@jakubgs would be the source of authority on the matter, but I can also take a look.

This would be great. Can this be part of ongoing work on rate limiting or do you need me to create and plan an issue?

I can take care of it, it’s a one line change, I’ll create an issue in status-go.

hester · June 16, 2020, 10:47am

@andre can this approach be used for the referral program as well?

jakubgs · June 17, 2020, 8:17am

@hester it’s 30 days:

https://github.com/status-im/infra-eth-cluster/blob/d8dca7058c9e25ba8a718c8385749a5688d43429/ansible/roles/statusd-mailsrv/defaults/main.yml#L58-L59

ceri · June 23, 2020, 9:01pm

Some additional info from our advisors to help us confirm that we are indeed a processor role as it relates to users’ personal data (e.g. IP addresses):

These are Yes/No questions - if majority is yes then likely a processor:

You follow instructions from another party with regard to the processing of personal data.
You do not decide to collect personal data from individuals.
You do not decide on the legal basis for the collection and use of that data.
You do not decide the purpose or purposes for which the data will be used.
You do not decide whether to disclose the data, or to whom.
You do not decide the data retention period.
You make certain decisions on how data is processed, but implement such decisions under a contract or
another legal act or binding arrangement with the controller.
You are not interested in the end result of the processing.

@hester @andre would be interested to hear your interpretation on these, thanks!

hester · June 24, 2020, 9:37am

Ai double negatives. Here we go

You follow instructions from another party with regard to the processing of personal data.
No, we do not follow instructions from another party
You do not decide to collect personal data from individuals.
No, we do decide
You do not decide on the legal basis for the collection and use of that data.
No, we do decide
You do not decide the purpose or purposes for which the data will be used.
No, we do decide
You do not decide whether to disclose the data, or to whom.
No, we do decide
You do not decide the data retention period.
No, we do decide
You make certain decisions on how data is processed, but implement such decisions under a contract or
another legal act or binding arrangement with the controller.
No, we do not implement decisions under a contract of another legal act or agreement
You are not interested in the end result of the processing.
Yes, we are not interested

Status stores the data on behalf of the user and uses the data on initiative of the user.

I’m unclear if data stored is personal identifiable information, if a hash of the IP address is stored rather than the address itself.

ceri · June 26, 2020, 6:07pm

Another thing to review that the advisors pointed out to me is that there is a second checklist:

the checklists on page 20-21 and on page 13 are both helpful in assessing which role Status.im has when processing the personal data within the mobile application - data processor or data controller.

If the majority of the responses to the checklist on page 13 is ‘YES’, Status.im will likely to be a data controller for a specific set of processing operations.

If the majority of the responses to the checklist on page 20-21 is ‘YES’, Status.im will likely to be a data processor for a specific set of processing operations.

The above checklist is the Processor check (page 20-21 in the guidance) and it looks like we need to clarify if a hash of the IP address counts as PII, if so, we might be a controller based on answering mostly NO on the questions.

Another thing I’d like a legal opinion on is whether our storage of PII for the users’ benefit matters (i.e. we do it to enable the technology that protects the users’ privacy - so in substance this doesn’t feel like the spirit of the regulations which is around using data for our own benefit).

The checklist on page 13 is the Controller check, would be grateful for your thoughts on this one too:

You have decided to process personal data or caused that another entity processes it.
You decided what purpose or outcome the processing operation needs to have.
You decided on the essential elements of the processing operation, i.e. what personal data should be collected, about which individuals, the data retention period, who has access to the data, recipients etc.
The data subjects of your processing operations are your employees.
You exercise professional judgement in the processing of the personal data.
You have a direct relationship with the data subjects.
You have autonomy and independence (within the tasks assigned to you as a public institution) as to how the personal data is processed.
You have appointed a processor to carry out processing activities on your behalf, even if the entity chosen for that purpose implements specific technical and organisational means (non-essential elements).

hester · July 22, 2020, 1:41pm

Learning of the day: Never underestimate the in-depth knowledge lawyers have of new technologies. Received some great feedback on the app from a privacy perspective

Had a call with the agency writing our privacy policy. Some notes and questions to circle back:

For Hash(IPAddress), I understand they are stored till ‘node’ restart. Is this ‘history’ node restart? If so, for the history nodes controlled by Status, can we auto-restart after a given timeframe. Can this be a configuration of history nodes in general? @cammellos @jakubgs
Hash counts as pseudo anonymizing and doesn’t change this being PI info
Feature to switch off use of any mailserver would go along way in making a case for the user being a controller and Status a processor @maciej can you propose what this looks like in the UI? Please correct me if I’m wrong in understanding that this means you can simply only send and receive messages when you’re online and will not receive history
Copy in onboarding says we don’t Collect, Share, Sell; we likely need to remove or rephrase Collect, waiting on final policy to confirm cc @maciej for future reference
It’s recommended to also explain in the Privacy Policy that we do not use key storage offered by the OS @petty @roman do we have any documentation I can share, for the agency to translate to human language
The fact that a peerID is known to the history node, alongside an IP address, instead of identifiable info like a chat key is critical, but not explained well (no action, cc @andre for future reference)
Who manages the history nodes is critical and needs to be explained well in the privacy policy (no action)
ENS copy does not clearly explain pros and cons @hester @maciej this needs revisiting
Add account, directly from bottom sheet in browser or selecting to earn referral bonus is a nice to have to further demonstrates and eases optionality in exposing data @Ferossgp @maciej @John
Need to understand if the handling of Play store receipt and referral code forwarding in the referral program can be seen as using cookies @andre @Ferossgp
I understand image and audio messages to be a content type, stored like any other message. Can you provide a 2-3 sentence explanations to clarify this? @cammellos
There is no legal reason to use email as a contact method. As we’re likely asking people with questions on the privacy policy to give up more privacy by sending an email, we can consider setting up an account with a chat key and offer it as an alternative to email. This needs governance @petty @ceri would appreciate any suggestions

cc @iurimatias I think it’s good to review the policy and these questions from Desktop perspective as we ideally use one and the same policy.

ceri · July 22, 2020, 3:45pm

Would it work to have people ping their questions to a specified channel in Status?

hester · July 22, 2020, 3:56pm

I believe the requirement is ‘low barrier’, if you’re already in Status I think that surely qualifies. We can always keep the emailaddress for good measure. Upside and downside is that anyone can answer

andre · July 24, 2020, 5:13pm

Can totally be seen as using cookies, as we’re storing a bit of information across a flow.

I’m not 100% sure but I suspect we don’t log or store the client IP. Will verify with @jakubgs.

jakubgs · July 24, 2020, 7:22pm

We do not log IPs alongside peer ID. For example, here’s a query in Kibana that shows logs for last 30 days that have peer_id field and also the word ip in the message:
https://kibana.status.im/goto/693d9fe91261b7e2e30b00d21198033a
The result is zero logs.

You can also filter the source code for peerID and words like logger or debug:
https://github.com/status-im/status-go/search?q=debug+peerID&unscoped_q=debug+peerID
I found no examples of us doing this.