Weekly Status Attack Day?

Bruno · November 26, 2018, 8:14am

My personal feeling is that we’re taking current infrastructure (back end) stability for granted. I would like to propose a weekly or bi-weekly attack day on Status where we find ways to bring the app and the servers behind it to a halt or completely down. Now is the ideal time to do it since a lot of people internal to the org use it, few from outside do, and we’re in public pre-beta, so things are meant to break with no consequences. I’ve briefly discussed this with @oskarth in Prague but am wondering what others think, how viable this is.

Approaches

The attacks could be thematic, which would require a bit of preparation and openness from the dev team. For example, it would be hard to simulate an attack in the form of the removal of Status from the app store(s), so such an attack would need to be an organized event - i.e. “Let’s all uninstall Status and try to reinstall it but without using app stores - identify problems, let’s reduce friction” etc. At the same time, another team could be attacking the sites hosting binaries that are alternatives to app stores, so we have a “problem with actually getting Status” type of attack.

Others would have varying degrees of complexity, but I think that’s all things we can work around - maybe those with more planning can be proper events planned a month in advance, giving both the dev team time to prepare, and the attackers time to identify holes through which to nuke the system in a targeted way. Other ideas:

simulate a destruction of Infura. For this, the communication channel should be opened up somewhat (perhaps a network-wide /etc/hosts emulator) which would let attackers intercept requests, simulating unavailability of Infura. Result: this would break a whole lot of things.
simulate a destruction of Etherscan. This would break the wallet.
spam attack. Self explanatory, it would make communication in the app impossible if the channels were flooded with spam.
DDoS attack. Related to above, a simple mass spam attack can also DDoS the network. Additionally, the network can be DoSed by just pinging endpoints of any mailserver nodes.
attack the mailservers. DoS them into misbehaving. This would also break all communication.

Benefits

identification and verification of known attack vectors
threat assessment of attack vectors and realistic prioritization of various outstanding back end issues
organization-wide increased familiarity with inner workings of the app and its systems
resilience and increased world-war-three resistance in line with Eth 2.0 overall message
not patching things as we go, but actually being prepared for when censorship comes. Competition is ramping up.

Incentives

The attacks could be incentivized with SNT. I feel like this would make incredibly appealing bounties on Gitcoin and would likely make a splash in media (“company pays people to have its servers crashed”), and I feel like we could draw in lots of external contributors that way (i.e. someone who studies our code enough to attack it is a viable external contributor long-term).

igor · November 26, 2018, 9:58am

Yeah, using Chaos Engineering practices is always a great idea!

Bruno · November 26, 2018, 9:59am

Oh that’s really cool, #TIL

cryptowanderer · November 27, 2018, 11:24am

re breaking Infura - would be good to see some more discussion here: Add Alchemy RPC Providers for Ropsten, Rinkeby and Mainnet · Issue #6822 · status-im/status-mobile · GitHub I think it’s quite a high severity problem with a very easy short-term fix until ULC is available.

Otherwise, hell yeah, let’s break everything

naghdy · November 27, 2018, 11:55am

This is an awesome idea Bruno! CC @petty, who I spoke to briefly about this in ETHBerlin.

I’d also like to add corpsec to the above list (e.g. attacking core contributor devices, accounts, etc.)

petty · November 27, 2018, 1:31pm

This. Very much this.

I have budgeted for 3rd party red teaming, as bandwidth is a potential problem here, and implementing something like this would lead to drastic culture changes.

Here are some thoughts along this line:

We are entering in a contract with HackerOne to have dedicated people attack and attempt to break various parts of our codebase (whatever I determine is in scope for them). This will allow us to have people who regularly attempt to break things put attention towards us and disclose what they find in a professional manner that we can mitigate. AFAIK this does not put things in scope like our back-end infrastructure (which we should be actively attempting to get rid of).
Since they cannot properly reason about things we rely on outside our codebase, doing something along this proposal will drastically increase our awareness of our weaknesses, but also slow down the development of new features as it’s clear everyone who works here is already quite busy. This is not necessarily a bad thing, but something we have to come to terms with. For instance, if we do a day of attacking some part of Status, and find 2 severe issues, how long does it take to coordinate, reason about, and fix said issues? I’d argue it doesn’t matter, as we shouldn’t be moving forward without fixing them. Some might feel otherwise.
Is it possible to get to a state of release if we spend all of our time finding and fixing things, and not thinking about / implementing new and novel features? I’d argue that if we don’t have a solid foundation, it doesn’t matter, but I think we’re all feeling the time crunch as this point in the crypto winter.
Something like this will allow us to really flesh out the details of our dependencies, and build strong confidence around what works and is resistant to attack (within the confines of our own expertise). This then helps us reason about what we should be asking others to look at to break.
In regards to us doing the attacking. There is an aspect of doing this that is completely artificial, as we know we are doing it. Social attacks work because people don’t know they are under attack, as do other forms of malicious behavior. Internal attacking goes a long way for us to look into weaknesses, but at some point, outsiders have to do it without majority internal knowledge to gain better knowledge of readiness and resilience to attack.

Bruno · November 27, 2018, 2:13pm

All good points. It would indeed be ideal if external people were doing this. There could be “spies” in the organization who could feed external teams some of these known vulnerabilities and attack vectors.

However, most of the deadliest attacks cannot be done without internal support because we do need to kill and work around the most egregious examples of third party reliance: etherscan and infura. Those can only be knocked out by either preventing the app from talking to them (internal etc/hosts emulation), or by talking these services into collaborating with us (or a third party) on an attack and blocking our app’s requests. Warrants some thinking, I guess, but I don’t think social attacks should fit into this category - those are out of scope. You cannot effectively immunize people against attacks, but you can make the system near invulnerable. I say leave people security for another day.

Anyway, another attack surface is dependencies. Status being very complicated to get started with does not help with identifying those. As an example, when I tried to extract the code for generating random Status names for reuse in Nim, I ran into the fact that the generation hops through something like 10 files and 3 dependencies to pull in a simple Mersenne twister based on the JavaScript-based Chance.js package, which is ultimately a seeded randomizer. Coincidentally, those 10 levels of dependencies could have been avoided because there is a clojure native implementation of Mersenne (granted, this is clojure not clojurescript so I’m out of my depth and might be wrong). Why are dependency chains bad? Well, you never know what you’re going to get, especially in JavaScript land. case in point 2 - related by @arnetheduck: attack explainer here

Dramatically reducing dependencies would help a lot, and they also make for a nice attack surface, but again, support from the team is needed. For example, the package resolver could be modified to first pull packages when building status from an internal package repository, and then if not found there, pull from NPM (during attack day only). This would let attackers replace a dependency and simulate an invalid or infected dependency, which would let us test resilience on that front.