Technical Debt our Best Frenemy Forever

bhauman · November 5, 2018, 2:47pm

Technical Debt our Best Frenemy Forever

All applications grow. As their codebase grows, their complexity
grows.

You can think of code as nodes on a graph of functionality where the
edges represent dependencies between the nodes. The problem is that
each node on the tree was implemented by a human that more than likely
missed several important things, and made several other limiting
assumptions when they implemented it. As an application grows we
continue to add more imperfect nodes to the graph. The bugs and
limitations in these nodes are together what we call technical debt
and it can compound very quickly up until the point where it is very
difficult to progress. When a codebase has a lot of technical debt,
introducing new features will constantly break old features, and every
release brings as much breakage as it does positive change. Working on
code with a large amount of technical debt is like playing
whack-a-mole, you knock one down and the other buggers keep popping
their heads back up in places that you never expect.

Code with a lot of technical debt is hard to change, because when a
developer makes a change they need to chase the ramifications down
through the complexity of the whole system. So while a developer may
have been able to successful make the change without introducing any
new problems, the process of making the change was much harder than it
needed to be.

The experience of working on a project with a high degree of technical
debt is rarely a good one. Product managers become frustrated and the
developers start to dislike working on the codebase. In general,
everyone starts to become a bit demoralized as they drastically scale
back their visions for the number of changes that can be made to the
software in a given period of time. This overall drag of technical
debt is further complicated because developers unconsciously avoid the
pain changing things that have become too complex to reason
about. Instead it becomes easier to add more code and more debt or
simply move to another project that doesn’t have as much debt. Product
managers start to avoid the pain as well, unconsciously steering away
from the hairball areas that seem to gooble up time and prduce little
tangible results.

One could characterize technical debt as a bad thing, it feels bad,
but it happens every single time you develop an application. We need
to accumulate technical debt to learn about our domain, full
stop. There is no way around accumulating technical debt. Yes,
hard-won experience can help you avoid some tech debt, but the
hindsight you acquire while gaining domain knowledge is a much more
accurate lens than trying to see the future and prevent tech debt at
the outset. In fact, many attempts at preventing tech debt actually
manifest as causes of technical debt. So technical debt is absolutely
unavoidable. Technical debt is our frenemy, it’s just an active part
of the process that we can’t live without.

When we get to the point where we accumulate so much tech debt that we
can’t cover the interest, and can’t progress, there is no choice but
to go back to the base nodes and start firming things up. We need to
separate these nodes, tease them apart and understand their full needs
and effects on the greater application. In this process, we will
integrate the information that we gained while initially sprinting
forward, we brutally get rid of the things we thought we might need,
and destroy abstractions that we imagined would be helpful but on
reflection provide little actual utility. We reduce and distill the
code until you arrive at its essence, its stupid simple and
boring essence.

We need to make the foundation nodes on the graph solid as bedrock and
then continue up the nodes in layers to shore things up. You know that
you are refactoring correctly when the number of lines in your
codebase drastically shrinks, and your code is re-rendered as
simple understandable boring code. This shoring up and elimination
of tech debt restores forward momentum on software projects and
basically makes them fun to work on again. It unlocks their ability to
reach their potential.

It’s important to realize that this process is a deliberate and
careful process and not a reactionary process. We let the code that
exists guide us, we don’t just dump it and start over. I’ve seen teams
take wild swings and come up with simple sounding but drastic
solutions so that they can do something now and get the ball
rolling. This is an understandable reaction as folks are
frustrated. But trust me don’t elect a crazy orange faced leader and
think that is going to fix everything. The solution is to use diligent
reflection, careful refactoring, and reduction of the codebase.

During the process of reducing technical debt forward movement in
terms of additional external features will slow and perhaps stop, and
this is frustrating for product managers and other stakeholders as
they can not see and feel the improvements. But this situation is no
less frustrating than the actual situation where each release results
in little forward motion anyway. Product managers do have a roll in
the refactoring process, refactoring goals and milestones are just
different, but no less rewarding. While refactoring, improvements do
come and they come much more quickly than one might think. The backlog
of bugs starts to shrink, strange application behaviors begin to
vanish, and the application starts behaving the way it is intended to.

Status

At Status, I see folks struggling to move the codebase forward. The
status-react codebase has become quite complex and is hard to
effectively reason about. Looking through the code I see several
patterns that have decreased the ability for individual developers to
understand the full implications of the code they are currently
looking at, and increased the likelihood that making changes locally
could have unknown ramifications in other parts of the application. I
also see examples of code that creates tech debt in order to solve
problems created by older tech debt. The codebase looks exactly like a
codebase where coders have been sprinting ahead accruing features for
quite some time. To me, the status-react codebase appears to have
reached a limit in the amount of technical debt it can accumulate.

What Status is doing is challenging. It is challenging both at a
technical level, and at an organizational level. It’s a bold
experiment in autonomy and it’s a worthy one. I’d like to offer some
practical suggestions on code patterns that will support the autonomy
of individual developers and small teams to work on a large codebase
and have a high degree of confidence about the code they are working
on and its relationship to the greater codebase. We can structure the
code to support autonomy by creating a contract between the local code
and the vastly complex world of the larger application so that the
relationship between the two is explicit and understood. We want to
support developers so that they can act in confidence locally and know
that they are not causing spooky action at a distance and wreaking
havoc on other parts of the application.

Encapsulation, components, and strict API boundaries have served
developers as time-proven tools that help reduce the scope of things
developers need to consider when working on a piece of code. As
developers we benefit tremendously from encapsulation and division of
responsibility. We benefit when we use components and libraries that
hide their internals from us and present us with a stable API as a
contract between us and them. This is what allows us as developers to
coordinate with a myriad of completely independent entities across the
FOSS landscape.

Let’s think about that, at Status we effortlessly cooperate with 1000s
and 1000s of other Open Source programmers. The open source ecosystem
is an absolutely incredible distributed team. Cooperating with this
immense distributed team is practically effortless. We can do it
because we aren’t all up in other people’s business and they aren’t
messing in ours. The defined API boundary between external code and
our code is a contract between them and us that supports our
independence.

Because of the nature of how we work at Status, I believe that using
patterns that support code independence will not only provide more
reliability to the codebase but will also structure the code in a way
that supports and encourages Status’s ideals.

Using refactoring and strict boundaries don’t guarantee solid
well-functioning code but it does support the likelihood that it will
emerge. It creates an environment where a developer can reason much
more effectively about what they are working on because the scope and
ramifications of the code at hand is much smaller.

Re-frame and scale

Re-frame was designed for and is used successfully on Web applications
of a certain scale. It is normally used as a central controller that
synthesizes many individual parts into a complete whole. When the team
at day8 (the authors of Re-frame) create event handlers they end up
being very shallow and explicit. The event handlers they write are
shallow because when you are developing a normal Web application it’s
only one or two event hops to a REST API call or a third party library
call. The event handlers are shallow because they are either just
altering the app-db or talking to stateful encapsulated systems
like databases on HTTP servers.

Most of the work they do at day8 is data and display oriented, and
they benefit from the Re-frame pattern because it does data and
display really well. The vast of majority (90%) of their event
handlers are simple and alter the database only.

Beyond a certain scale Re-frame ceases to be helpful and starts to
display the common symptoms of mutable global state.

I would say that Status.im is currently experiencing the shortcomings
of the taking Re-frame pattern too far. The Re-frame pattern has
deeply entangled itself into different parts of the codebase that
would be much better off as separate entities.

Breaking the Re-frame rules

There truly is no silver bullet. There are good guidelines, principals
and trade-offs but there is no one pattern that you can simply lean on
and expect everything to turn out OK.

In support of reliability, simplicity, understandability, via code
isolation with strict boundaries, I am going to suggest that
developers do things like create namespaces and components that manage
their own local state, I’m going to recommend this because I strongly
beleive it is the correct trade-off to make. Some folks may resist
this, but in many cases it is what will allow the creation of local
components that don’t leak local information into the global
context. Again this trade-off is the very same trade-off that we
benefit from whenever we use a third party library like web3 or React
Native component. It is much better to compromise and use local
mutable state and in return get the concrete benefits of simple code
that isn’t complected with the greater application than to have a
notion that we are doing something the “right” way yet isn’t actually
offering any real utility.

Could you imagine if all of the internal state of the web3 library,
the HTTP server (threads, request state), all the third party React
Native components and the Ethereum node was all present in app-db?
It would be a nightmare because it becomes very hard to tell where
layers of functionality begin and end. Would we be able to see the
boundaries of the HTTP server versus Ethereum? As these things mingle
together they start to become complected into one big mass and the
surrounding application code begins to depend on them being one single
thing.

So at times, I’m going to suggest deviating from the Re-frame way, and
in its place, I’m going to suggest some very straightforward,
stateful, boring, yet time-proven patterns.

To be clear, I’m not suggesting ditching Re-frame but rather
relegating it to the UI and the high-level data that’s needed for the
UI. Let Re-frame do Re-frame at a scale that makes sense. I want
Re-frame to be the brain that coordinates the application but doesn’t
get overzealous and micro-manage the parts of the body that already
know locally how to do what needs to be done.

Recommendations

Isolate chat/wallet functionality and into a well tested library

Isolate all code that talks to the blockchain and mail server for chat
and wallet functionality into a high-level library (or several
libraries) with a well-defined API and set up an integration test
suite that thoroughly tests this code (in Node) against geth (or
testrpc) and status-go.

Think in terms of a separate library that has an API that you could
build a completely different chat application on. Imagine you wanted
to make command-line curses-based chat client in Node and you wanted
to use this library.

The library API should be minimal and provide the consumer a level of
abstraction that leaks the least amount of internal implementation
details as possible. In an ideal world, you would be able to
dramatically alter how the API is implemented (say moving to Swarm PSS
or Signal) and it would make zero difference to the UI of the
application that utilizes it.

This library would need to maintain local state much the way that the
web3 environment underneath it does.

The code in core library would be written in a way that makes it very
explicit how chat communication on the Status platform works. It might
be a good idea to write the core code in a straightforward literate
(well commented) style to help all current and future developers
understand the internals.

The integration testing of this library should be extensive and
rigorous and reflect the importance of this core library to the
functioning and success of Status. Erroneous situations should be
exercised and well understood.

This library should be the rock-solid foundation that the rest of the
application springs from.

Careful refactoring is the way to get there

As mentioned above I recommend careful refactoring and distilling the
code down to its essence. Refactoring is the tool that will help
produce the core Status chat/wallet library mentioned above.

I am against starting an initiative where a team or a developer goes
off on their own and writes this high-level library from scratch. We
can keep the application functioning and whole while we clean things
up.

Break up UI code into components

Currently, our view code benefits from encapsulation when we use
components that are provided to us by React Native and third-party
libraries.

IMHO Status’s view code would benefit further if we broke up the UI
into small, medium, and large-scale reusable components that do not
rely internally on Re-frame subscriptions and events. Again these
components would have a high-level well-defined API and would
encapsulate their behavior such that the internals of the components
would know nothing of Re-frame and the structure of app-db. We would
then assemble these independent components and wire them into the
larger application by providing data from the appropriate
subscriptions etc.

This isolation would, yet again, free developers from the need to
understand more of the application than necessary. It would give
developers independence from the larger picture and allow them to
concentrate on the task at hand, which is to create great UX.

These components should feel free to deviate from the Re-frame pattern
and use local state, and even talk directly libraries and services. As
a motivating example, think about the HTML Video tag and all the
functionality it provides, it talks directly to browser services and
fetches the video and keeps track of all the internal playback state.

I could imagine a wallet React component that manages all of its own
navigation, and talks directly to a high-level wallet API, without
using Re-frame as an intermediary.

Specifics

I’m aware that this post is general. I address subjects like techical
debt and refactoring, while not providing specific examples. This is
something that I’d like to have a chance to do.

I think communicating specific examples of technical debt and
demonstrating what I mean when I talk about refactoring is going to
require a wider bandwith than a post on discuss.

If people are open to it, I’d like to pair program with some folks and
share concrete examples of reducing technical debt in the status-react
codebase. We could also experiment with some mob programming sessions
where we have lots of developers participate in a single programming
session.

I’m also happy to talk with anyone to answer questions.

Conclusion

IMHO the status-react codebase has accumulated enough technical debt
to cause significant friction in moving the codebase forward and
allowing Status to reach its stated goals. Technical debt is not a bad
thing but rather a normal feedback signal on the road to learning how
to build what you are building. There is no way to solve technical
debt other than careful consideration of the code and refactoring it
down to its essence. Drastic measures (Write tests for everything!
Write specs for everything! Re-write it completely!) normally just
produce more code and are a clumsy way to address the fundamental
problems.

Code isolation patterns will allow a distributed team to coordinate
much more easily and allow folks to focus on the problem in front of
them with confidence.

I fully believe that these things will help restore forward momentum
in the project and bring much more joy to everyone working on Status.

Big thanks to folks for taking the time to read this!

Bruce Hauman
[email protected]
bruce.stateofus.eth

pedro · November 5, 2018, 5:56pm

Thanks for writing down your thoughts with such clarity Bruce. I agree with the points you’ve made, especially the suggestion that refactoring should not be any one person’s undertaking, but needs to be viewed as a team effort, where people buy into the vision and actively help get to the desired next checkpoint state. Also, mob programming sounds like a good way to get on the same page (even for non-Clojure devs like me who are only occasionally touching the status-react codebase).

rachel · November 6, 2018, 12:15pm

I’ve added this as an objective to the Q4/Q1 core priorities doc to make sure that it’s widely considered.

If you’d like to add details there about potential solutions, please do @bhauman

Original version (see last bullet under core): Status OKR Brainstorm - 18Q4/19Q1 - CodiMD
Version being edited for voting DApp: Copy of Core Prios - HackMD

jan.herich · November 7, 2018, 8:20am

Hi Bruce, thank you for the awesome writeup ! Needless to say, I had almost the exact same feelings about code complexity when I joined Status more then 1 year ago, I had to say a lot was done to address the issue, but obviously, it’s still not enough and some fundamental problems persist.
I would love to chat about our architecture, pain-points and your proposed solutions, so I will contact you on Status and we can arrange for chat/pair-programming session.

goranjovic · November 7, 2018, 12:37pm

Thanks for the awesome post, Bruce. I think you accurately captured the current state of affairs and even some counter measures we’ve either done so far or at least discussed.

cc-ing @cammellos with regard to having Chat, Wallet, Browser and other modules as encapsulated entities with clear public interface (as in exposed to other modules).

I’ve spent the last couple of months on Wallet bugfixing, and the only way to get out of the vicious whackamole cycle was indeed to take a step back, try and see a general solution and do that, not just apply band-aid. The running score is that different internal parts of Wallet are now consistent and the interfaces to/from Chat and Browser greatly simplified. Given all that, I’m all in with the effort to modularize it all the way down. Let’s continue this discussion.

petty · November 7, 2018, 10:08pm

The protocol meeting we had at Prague discussed a lot of the same ideas you’ve put forth in this, in short:

we could use our current codebase as a reference for what Status should do as an application
enumerate this list and map them to various levels of the ideal technical stack
look at existing protocols at each stack layer and evaluate thoroughly if they are sufficient for our needs, and what tradeoffs they may give
potentially create new protocols (hard) if nothing is sufficient

I’m not positive of the current state / decision process of moving this forward, but you are not alone in seeing the issues.

oskarth · November 8, 2018, 4:48am

This is a great post, and I agree with a lot of it.

Re protocol work: I’d like to keep this as separate as possible from the current app work. This ensure mental clarity and allows us to question fundamental assumptions. This doesn’t mean there aren’t a lot of lessons protocol can’t learn from app, and vice versa. But both pieces of work can proceed in isolation and without being blocked. Personally, I’m at an early research stage for this, essentially organizing my brain and trying to understand origins of certain base layers (tcp/ip stack, p2p, Chaum, Tor, mixnetworks), and how others are thinking about these things (Briar, etc). Hoping to post something soonish about rough plan. This shouldn’t stop people from thinking about current specification of the app/wallet in a layered and rigorous way. In fact, that’d be really useful for everyone’s mental clarity.

Re separation of concerns and wallet stuff in particular: I’d love to see this. FWIW, I’ve also been a bit skeptical about the re-frame fundamentalism, but haven’t had the bandwidth/been quite sure about how to best attack it. Glad to see someone vocally sharing these concerns!

Here are a few benefits and avenues which I think are worth exploring:

thinking about the wallet as a dapp, where the base layer is transaction signing

one way of doing this is through a basic API/component/library setup
this also enables people to have different mental models, i.e. one might not want to put NFTs in a “wallet” (you don’t put a cat in a wallet, as Goran mentioned some time ago)
it fits more with Status as an OS, where our wallet is just the default app that you can swap out
how this works with extensions is unclear to me right now, but that could be an interesting venue
there are a ton of stuff that can be done with transaction signing in general, e.g. multisig wallet, subscriptions, group chat integration (coordinating multisig actions), etc.

separating out key handling completely to get defense in depth

i.e. be two mistakes away from leaking sensitive info, see NASA engineering practices
right now we are one mistake away from leaking stuff (and it happened with TestFairy recently), which is unacceptable for real-world significant usage
one way of doing this is through keycard, but there probably more ways of doing this (enclave, isolation)
recreating/using kernel/userland abstraction at app layer to the extent this is possible - unclear how feasible this is, but we haven’t done enough research on it

By separating out the wallet API completely, this would lower barriers to entry for people to contribute and allow for more interop

it could also perhaps be used in other wallets, there are a lot of talks about wallet standards etc, e.g. at https://ethereum-magicians.org/ and EIPs
if we could decouple a lot of wallet stuff, this would mean bounty hunters could use something like devcards to rapidly prototype UI features without the overhead of our complex toolchain
I also like the idea of creating more of a contract/mocking with what the code is expected to do, e.g. this would make things like Infura lacking solid getLogs abstraction more evident vs LES, which results in a fucked up transaction history
it would be a great fun, and less overwhelming, project for Clojure devs who are a bit curious about crypto - imagine going to a conf/meetup and giving some nice bounties, then people can come home, hack for a weekend without knowing all the complexities of the current toolchain, and become part of our community of contributors

Same principle for a lot of this obviously applies to other parts than the wallet, as you already alluded to.

Also PS: did you draft this post in Emacs by any chance?

roman · November 9, 2018, 10:13am

Hey Bruce, thanks for your thoughts.
First of all, I completely agree that modularisation will help to make codebase/project structure better, more readable, easier for comprehension. At least as the project has relatively big codebase already.

When I joined the project, the messaging part of app was a separate lib, with its own state and API. In a nutshell, the complexity of supporting a separate lib with a lot of moving parts and constantly changing API was quite big compared to the advantages provided by modularization. It might have been easier with better-defined boundaries (API) between the whole app and lib and with a better architecture design of the lib in general. But it wasn’t clear if such effort at that point will be really useful considering available resources and other constraints.
What I’m trying to say is that we might think that decision which is done right now is the best option, but there is no guarantee that it will look so from N-years-after perspective.
That leads to a recurring situation when a new member joins the team: me, Jan and now you Bruce (well other members of the team as well) were not satisfied with a quality of the codebase when joined. We already did a lot to make it better, but it still sucks. And probably some efforts only made it worse.
What I also observed (on my own example but not only): over time everyone who suggested and contributed changes which aimed to reduce technical debt or to improve quality of the code was relatively satisfied with a state of things after some period of time, without achieving any ultimate goal and making it indisputably perfect. It might also happen that while you are getting more familiar with the codebase you just start to ignore some not perfectly written things and consider them as acceptable.

So… our metric for these improvements might be the impression of a newbie a year later, or probably just a review from a person which is not familiar with a code. Anyway this time we need to think how to make it look good years after. Not just now.

A few more general statements. I think we would need to define some finite number of modules which we want to support with a clear justification for the creation of each of them. Splitting up might also become an endless effort and add the same amount of complexity just on a different level.

Speaking of specifics. Particularly about Chat. Right now I would suggest moving everything except UI to status-go, so that golang side provides messages history/contacts/chats, notifies us about incoming messages, performs all work related to de/ serialization, encryption, mailservers, etc. The reason for having some parts of messaging on go side, for instance, FPS stuff was discussed in the past: we want to have crypto related code to be written in go. Right now messaging code is scattered between RN app and status-go, which makes it really hard to read and comprehend as a whole thing (that’s why we have comments like this one https://github.com/status-im/status-react/blob/693eae9cf9793185193c486bce350d28951f5cc0/src/status_im/group_chats/core.cljs#L22 ). High-level tests where you could pass some message’s text and check if it was received are a nightmare. All struggling with mailservers might have the same root also because it’s so hard to debug the whole thing. If we do this separation it’s still an open question if we need to extract chat’s UI stuff from the rest of clojurescript.
I don’t have a strong opinion regarding the Wallet atm.

upd: moving chat to golang side has been already discussed here Status.app

rob · November 16, 2018, 12:15am

@roman, just curious. what were the specific pain points around managing the complexity of messaging as a separate lib when you first joined? versioning and dependency management or something else? cheers!

roman · November 24, 2018, 8:08am

Hey @rob. The main pain point was that most of the changes were added ad-hoc. There were no solid design decisions made for that library and as consequence, literally, any change on library side required changes on the client side. So practically although we had two separate repos for both app and “protocol”, they were maintained as a single thing, which actually eliminates the whole advantage of API as a tool for separation of concerns. And also adding changes to two separate repos and maintaining versioning when you just want to solve/implement each single issue/feature feels like an extra overhead.

rob · November 25, 2018, 3:28am

@roman, thanks for going into more depth on the backstory here. yea, I could see how you would not want to repeat that experience!

stefantalpalaru · November 25, 2018, 3:57pm

Imagine you want to make a chat client in C/C++/Go/Java/etc. - the ecosystem you need to connect with is much bigger than the JavaScript world.

Case in point: making a Status chat plugin for Pidgin. Is that possible right now?

bhauman · November 26, 2018, 2:11pm

The point of the post was really around internal design and separating concerns. Once those concerns are separated, the underlying implementation is absolutely malleable.

cryptowanderer · November 28, 2018, 8:07am

Perhaps we could update our Clojure style guide and conventions and include it in our new docs based on the work done here? https://github.com/status-im/legacy-wiki/blob/master/archive/Clojure_Style_Guide.md and https://github.com/status-im/legacy-wiki/blob/master/archive/Coding_conventions.md

Perhaps we could also update the Code Review, especially if bounties are going to be a bigger thing again: https://github.com/status-im/legacy-wiki/blob/master/archive/Code_Review.md