Wallet improvement log

yenda · May 13, 2019, 11:06am

In a previous post, I proposed an approach to improve the transaction list in the wallet.

TLDR: we are making progress, in this first post I propose a bunch of additional optimizations based on experience gained from implementation, and an algorithm to solve the fetching transaction history problem.

The good

We started the work last week and here are the parts that are already done:

implementing a subscription mechanism through signals to avoid the pull loops of web3 and in the wallet (done by @igor)
use subscriptions for updating transactions confirmations dynamically (done in https://github.com/status-im/status-react/pull/8158)
use subscriptions for adding new token transactions dynamically (done in https://github.com/status-im/status-react/pull/8184)

The next step is to fetch transactions once at login from etherscan and then check in each new block if there is some eth transaction to get rid of the last wallet loop.

The bad

Transaction history for regular ethereum transactions is not a solved problem and the only solutions I could find were either to use a third party service like etherscan or iterate over each block looking for transactions.
eth_newBlockFilter returns only the hash of the last block, which requires an extra call to get the actual block
@igor : possible optimization would be to make a subscription that returns the last block number instead of hash. Since this is also the only way to get last ethereum transactions we could also make a subscription that only signals transactions to and from and address.
Token transaction history can be fetched through eth logs but the rpc call can timeout if the range is too high so it needs to be done iteratively.

The ugly

We have two problems to solve:

fetching the complete token transaction history (currently limited to 2 weeks, not available from etherescan api, though they have an ethlog equivalent that could be checked out)
fetching ethereum transaction history without etherscan

I propose the following algorithm composed of 2 steps to solve both issues

Find all ethereum transactions without etherescan

This part is optional, we could use it only when etherescan is down, for instance during chaos unicorn days.

The JSON RPC API contains the following calls:

eth_getBalance which returns the eth balance of an address at a certain block
eth_getTransactionCount which returns the number of transaction from an address up to a certain block

Unless I am mistaken, we should be able to find all inbound and outbound eth transactions from block 0 with a recursive dichotomic search that would be at most log2(last-block) (roughly 23) levels deep.

check the number of transactions and balance at current block (block x)
check the number of transactions and balance at last checked block (block y) (genesis for the first run)

Algorithm: input balance diff and transaction count diff:

if balance and number of transactions have not changed: there was no transactions, we stop
if balance has increased but number of transactions have not changed: we are looking for inbound transactions (count transactions is only for outbound transactions), we check balance at block (x+y)/2 and repeat the algorithm for both [x (x+y)/2] and [(x+y)/2 y] ranges with transaction count diff 0.
if both have changed: we are looking for outbound transactions (even 0 eth transaction cost gas), we check balance at block (x+y)/2 and repeat the algorithm for both [x (x+y)/2] and [(x+y)/2 y] ranges with transaction count diff 0

Everytime we hit a range of 1 block we use eth_getBlockByNumber to get the block and retrieve the transactions from and to the account address.

The implementation of this part of the algorithm could be bountied, and eventually implemented in status-go entirely for efficiency.

Find all token transactions

Infura has some difficulties with huge ethlog ranges which is why this algorithm tries to minimize the amount of calls required to find all token transactions.

Some of the transactions found by the first step are outbound token transactions. It can be determined by fetching their transaction receipts.
At this point all we are missing are the token inbound transactions. For those we are first going to check the balance of the shown tokens.

From there we have 2 options:

BalanceOf

We can then use a similar approach as for regular transaction with the balanceOf method of the token contract, which takes an address parameter. eth_call takes an additional block parameter.
For each token transfered we do a dichotomic search using balance of to find the first inbound transfer between block 0 and block(first transfer)
For each outgoing token transfer, we check the balanceOf at the block the transfer happens for the token transfered. If balanceOf at block(transfer n) is different from balanceOf at block(transfer n-1) + amount transfered, we do a dichotomic search between the 2 blocks (or call ethlog if it is estimated faster, less than 100 000 blocks)

Ethlogs

We will then use that balance and the known outbound transactions to find the ranges in which the inbound transactions have occurred.
We will then scan these ranges using ethlogs. After this step only the inbound transactions that occurred before the first outbound transactions for each token will be missing.
For those we will call ethlogs iteratively going backward from the block of the first known outbound transaction back to the genesis block, except we stop once the balance has reached 0.

jarradhope · May 14, 2019, 2:49am

awesome to see yet another log, love it! keep it coming.
All the ideas here seem well reasoned and don’t have too much to add.

iirc there has been some other efforts in terms of collecting transaction history around ultra light clients, but cannot remember the details.

There was also this project that is attempting to do this in a decentralized way, but I don’t think it’s appropriate for us at this point in time.

yenda · May 15, 2019, 9:35pm

Progress so far

The final touches of https://github.com/status-im/status-react/pull/8221 are being worked on:

the PR is following up on https://github.com/status-im/status-react/pull/8184 which introduced subscriptions to ethlog filters to add new token transfers as they are added to a new block on chain,
it gets rid of the wallet loop entirely, and only fetches token transfers and transactions when login in
it gets all of the token transfer history and not just the last 100 000 blocks

Todo:

fix tests
handle case of missing blocks between new block and last current block (when app is in background for instance)

@igor any idea what happens to subscriptions if status-go is offline for a while? Is it catching up and sending signals when back online? (I’ll investigate tomorrow)

Next step

On status-react side:

clean up transaction model, subscriptions and view before adding persistence, with transaction errors in mind [Wallet] Ensure failed token transactions are shown as such · Issue #8153 · status-im/status-mobile · GitHub
implement persistence of transaction history Persist wallet transactions · Issue #8152 · status-im/status-mobile · GitHub
POC of algorithm proposed above

On status-go side:
As described in OP the ethereum JSON RPC API doesn’t offer an efficient way to fetch transaction history, and a POC of the proposed algorithm will be quite inefficient unless some better endpoints are provided by status-go.

igor · May 16, 2019, 6:32am

@yenda internally subscriptions just poll filters, so as soon as connection is restored, filters should be able to be filled and you should get new data.

Bruno · May 16, 2019, 7:51am

Consider using eth.events as fallback, their postgre db is open to remote direct connections and you can just copy needed data locally. I use them for a Uniswap project. The idea would be to download all transactions related to the current user’s account from them in a single query, store those on the phone, and use them to present tx data. Then when the user requests a refresh or periodically, refetch set from last fetched block. Works like a charm and is no big deal considering there are other, slower means of getting the same data if eth.events fails. Convenience with a persistence fallback.

yenda · May 16, 2019, 5:01pm

That is what we currently do with etherscan API. The next goal is not to have more centralized fallback but to be able to fetch transaction history without third parties (or at most an rpc gateway).

We do better than that, we get new transactions as new blocks are added to the chain directly from the geth node (or rpc).

Bruno · May 16, 2019, 5:50pm

Right, but I’m advocating for copying the data here, not querying it when needed. So query it all in a single statement when the address is detected, and store in the local storage. Because this data is structured and parsed and they result of a single call as opposed to several with other providers, suddenly you don’t need etherscan or eth.events for past information and you couldn’t care less about a geth node disappearing (which cannot show you past transactions anyway unless it’s an archive node, which you will never decentralize).

yenda · May 16, 2019, 8:29pm

The data is also stuctured with etherscan and it’s one API call for transactions and one for token transfers. Persisting them so that the whole history is only requested once is also part of the plan.

yenda · May 18, 2019, 6:41pm

Progress so far

All wallet related PRs that have been peer reviewed have been merged in a meta PR for QA:

github.com/status-im/status-mobile

DO NOT SQUASH Meta wallet transaction history

status-im:develop ← status-im:meta/wallet-transaction-history

opened 05:42PM - 17 May 19 UTC

yenda

+2256 -2526

This PR is a compilation of the reviewed wallet PRs so that they can be tested a…t once fixes #8151 ## #8184 Feature/token transaction signals part of #8151 * removes fetching of last 100000 blocks of token transfers from the wallet pull loop * fetches the last 100000 blocks of token transfers at startup * replaces pulling by subscriptions to ethlogs for token transfers ## #8221 Feature/transaction signals * remove the transaction fetching loop entirely to rely only on subscriptions for live transactions and token transfer updates * fetch token transfers history via etherscan API to lift the 100000 blocks limit on token transfers history * inbound token transfers are caught via a filter on ethlogs * outbound token transfers and other transactions are caught by filtering transaction in the current block that have the wallet address as `to` or `from` field ## #8224 Refactor/transaction details * removing computations from transaction details view and simplifying related subscriptions ## #8230 Refactor transaction history * Remove computations from views * Optimize computations in subs * Fix missing amount and label for unknown ERC20 tokens in transfer history * clean up transaction history and transaction filters subscriptions * move ui.screens.wallet.transactions.events to events * move ui.screens.wallet.db to db ## #8231 Improve wallet update performances * a few files are moved around because of circular dependencies, this continues migration of wallet and ethereum code into their own module (previous in ui.screens.wallet and utils.ethereums respectively) `wallet-autoconfig-token` is a very expensive call on mainnet because it checks the balance of every known token. it is called: * when wallet is refreshed by pulling * when user goes on any wallet screen This PR changes that by: * calling it only when the wallet is initialized and there is no visible-token configuration * it only calls update-wallet when a new transaction arrives ## #8232 Remove web3 and clean up wallet effects * introduce json-rpc namespace, which provides `call` and `eth-call`, a generic way of calling a json-rpc method taking care of conversions and error handling * remove web3 usage from wallet * clean up effects, reducing the amount of computations when login in ## #8233 Remove call-params usage * use `json-rpc/eth-call` and `wallet/eth-transaction-call` everywhere * move all conversions to abi-spec The goal of this PR is to unify the way we interact with contracts, so that future improvements can be made and impact all the callees. `json-rpc/eth-call` is for read only calls and `wallet/eth-transaction-call` is for transnational calls. # Testing When testing it is important to compare to what the situation is in develop. If things are not perfect but still equal or slightly improved compare to develop, we should treat them as bugs that I will fix asap in separate PRs, but this one is already very big and sets up the basis for faster future improvements of the wallet by cleaning up all the crap. I tested all of the following points on Android emulator and One Plus 5T over the different PR, I will test again once build is available for the global rebase and merge. ## Areas to be tested for improvements: * wallet much faster on slow devices (usual lag on wallet after login should be gone) * bandwidth consumption (see point below) * transaction history responsiveness (new transactions show up to 20 sec faster) * better gas estimation (before it would always show the same number now it is precisely what the transaction should cost) ## Areas to be tested for regressions: * anything that is related to transactions: * wallet * transaction history * in chat transactions * anything that is related to ens resolution: * add chat by ens name * browse dapp by ens name * stickers ## Bandwidth consumption I made the following experiment on Android: * use mobile data and disable sync to reduce the noise from whisper * recover `satoshi document engage inflict goddess auction rule unfair bid next buddy shy` * go on ropsten network * clear history in GlassWire (https://play.google.com/store/apps/details?id=com.glasswire.android&hl=en&rdid=com.glasswire.android&pli=1) * restart app * go to wallet history, check transaction details of first transaction, go back to wallet, pull to refresh, go to chat, kill app * reopen app, go to transaction history, wait 2 minutes, pull to refresh, kill app * switch 5 times between wallet and chat and wallet and transaction history status: ready

The meta PR will also include the following PRs once they are reviewed

As well as the PRs that will implement the changes discussed in What's next section.

The combination of all these PRs brings the following improvements:

complete ERC20 token transfers history
live updates of the transaction and transfers, as soon as they are added to the chain
much faster wallet initialization after the first run, removal of a lot of unnecessary network consumption
subscriptions and views for transaction history have been cleaned up and optimized a bit, some code reorg for wallet was done and there is still quite some to go because all of the wallet code was in the ui module.

Network consumption will be improved even further once persistence is added, because there is a lot of pending blocks returned by the currentBlock subscription and it requires re-fetching the complete history from etherscan to stay consistent (which was done every 20 sec previously anyway)

What’s next

finish cleaning up wallet module and replace web3 by status-go
check status of custom erc20 token implementation with @andrey because there is lots of potential for optimization there as well but I don’t want to clash with pending changes
persistence of transaction

Persistence of transaction

Working on the wallet code before implementing the persistence gave me a lot of insight and I think I got to the point where I have the safest solution for persistence while minimizing bandwidth consumption and computations.

The idea is to only persist confirmed transactions (more than 12 blocks old).
Every new transaction is added to unconfirmed-transactions map
Everytime there is a new block event:

we save last-persisted-block which is current-block - 12, and
we check if we have unconfirmed-transactions
if we do we persist those who are more than 12 blocks old, after checking them again with their hash.
Whenever we skipped a block or get a redundant block number, we flush the unconfirmed transactions and fetch the last 12 blocks again.

This allows us:

to avoid persisting transaction that where included in a block that turned out to not be in the main chain
to heal transaction history when this happens
to cache confirmed transactions and avoid having to recompute anything about them when new transactions arrive

Unknown and custom ERC20 tokens

With the current state of the new wallet implementation ERC20 transfers from unknown contracts are marked as ERC20 tokens with 18 decimals, when the history is fetched from etherscan they might get their actual name.
The idea would be to fetch the data from the contract when a transfer from an unknown token is found.

yenda · May 19, 2019, 8:12pm

Oops I did it again

I made an important refactoring of the wallet over the weekend, focused on removing web3 usage and using a generic json-rpc/call and json-rpc/eth-call methods. These can be used directly in the code which will help making it more obvious where we can optimize instead of being hidden behind X layers of code as it was before.

Here is an exemple with the inbound-token-transfer-handler. As you can see it is quite a ride, that’s because we need 3 levels of callback to get all the informations we want for a transaction. Later on, this kind of functions will be optimized by gathering this information in one rpc method on status-go side, so that we only need one call for it.

This one is actually a handler called upon reception of a signal from eth_newFilter subscription and I’ll present a better alternative in the next point.

(defn inbound-token-transfer-handler
  "The handler gets a list of inbound token transfer events and parses each
   transfer. Transfers are grouped by block the following chain of callbacks
   follows:
   - get block by hash is called to get the `timestamp` of each block
   - get transaction by hash is called on each transaction to get the `gasPrice`
   `gas` used, `input` data and `nonce` of each transaction
   - get transaction receipt is used to get the `gasUsed`
   - finally everything is merged into one map that is dispatched in a
   `ethereum.signal/new-transaction` event for each transfer"
  [chain-tokens]
  (fn [transfers]
    (let [transfers-by-block
          (group-by :block-hash
                    (keep #(parse-token-transfer
                            chain-tokens
                            :inbound
                            %)
                          transfers))]
      ;; TODO: remove this callback chain by implementing a better status-go api
      ;; This function takes the map of supported tokens as params and returns a
      ;; handler for token transfer events
      (doseq [[block-hash block-transfers] transfers-by-block]
        (json-rpc/call
         {:method "eth_getBlockByHash"
          :params [block-hash]
          :on-success
          (fn [{:keys [timestamp number]}]
            (let [timestamp (str (* timestamp 1000))]
              (doseq [{:keys [hash] :as transfer} block-transfers]
                (json-rpc/call
                 {:method "eth_getTransactionByHash"
                  :params [hash]
                  :on-success
                  (fn [{:keys [gasPrice gas input nonce]}]
                    (json-rpc/call
                     {:method "eth_getTransactionReceipt"
                      :params [hash]
                      :on-sucess
                      (fn [{:keys [gasUsed]}]
                        (re-frame/dispatch
                         [:ethereum.transactions/new
                          (-> transfer
                              (dissoc :block-hash)
                              (assoc :timestamp timestamp
                                     :block     (str number)
                                     :gas-used  (str (decode/uint gasUsed))
                                     :gas-price (str (decode/uint gasPrice))
                                     :gas-limit (str (decode/uint gas))
                                     :data      input
                                     :nonce     (str (decode/uint nonce))))]))}))}))))})))))

Learning by rewriting

eth_call hasn’t got much love in the code base for a while, initialy it had a poor’s man implementation of parameters conversion which prevented us from using it with more complex contract method calls.

We then introduced the abi-spec namespace, which based on the signature of the method can encode all the params accordingly https://github.com/status-im/status-react/blob/develop/src/status_im/utils/ethereum/abi_spec.cljs (I will move it into the ethereum.abi-spec namespace soon btw). Later on we added decoding based on a list of output params types as well.
But this wasn’t used much and a lot of our codebase, mostly in the wallet, kept using the legacy call-params or worst, web3. Now this is mostly gone, at least for the wallet part with the new json-rpc/call and json-rpc/eth-call methods.

As an interesting side effect, I discovered that all previous implementations of an eth_call helper function were passing latest as a second parameter, a map containing the address of the contract and the data being the first one. If I understand correctly, this means we can call any contract method at any block height

Maybe some limitations apply, tbd, for instance what is returned if you go for a block when the contract didn’t exist yet? If that is the case we could also use that to find the creation block of our known token contracts to limit how far in history we have to look back for each token.

This was my missing link for a better version of the no-etherscan algorithm to find all transactions. So I fixed it:

yenda:

From there we have 2 options:

BalanceOf

We can then use a similar approach as for regular transaction with the balanceOf method of the token contract, which takes an address parameter. eth_call takes an additional block parameter.

For each token transfered we do a dichotomic search using balance of to find the first inbound transfer between block 0 and block(first transfer)

For each outgoing token transfer, we check the balanceOf at the block the transfer happens for the token transfered. If balanceOf at block(transfer n) is different from balanceOf at block(transfer n-1) + amount transfered, we do a dichotomic search between the 2 blocks (or call ethlog if it is estimated faster, less than 100 000 blocks)

Ethlogs

We will then use that balance and the known outbound transactions to find the ranges in which the inbound transactions have occurred.

We will then scan these ranges using ethlogs. After this step only the inbound transactions that occurred before the first outbound transactions for each token will be missing.

For those we will call ethlogs iteratively going backward from the block of the first known outbound transaction back to the genesis block, except we stop once the balance has reached 0.

Better status-go signals

@igor @dmitrys
For the wallet the ideal solution would be that status-go automatically starts sending signals after user logs in for:

new block: should contain the block number, and if for some reason we got on the wrong chain for a few block, a rewind flag with hash of transactions to remove from history
new transaction: should contain timestamp and number from block, merged info from eth_getTransactionByHash and eth_getTransactionReceipt
new token transfer: same as new transaction

On status-react side I use eth_filter to find inbound erc20 transfers and I use eth_getBlockByHash on each new block with true second parameter to get the list of transactions, which I filter for transactions with user address as :to or :from parameter, which captures regular eth transactions as well as outbound erc20 transfers (but not inbound because that field is encoded in the data which I don’t want to decode for each block transaction)

yenda · May 21, 2019, 10:43am

TLDR

Yesterday was mostly focused on rebasing the series on custom tokens, testing and fixing bugs on my PRs.

I now consider the first stone of the wallet improvement series ready for testing. A summary and test description is available in the PR: DO NOT SQUASH Meta wallet transaction history by yenda · Pull Request #8228 · status-im/status-mobile · GitHub

This first iteration was focusing on code and performance improvements before adding persistence. While it initially targeted the transaction history only, I ended up rewriting a big chunk of the wallet and ethereum transaction related code.

This effort has made the codebase more accessible and facilitate future wallet and ethereum related improvements.

the way tokens are represented in memory is suboptimal, they are in a list which requires going through it every single time we look for one, this needs to be changed to a map and the crazy code that goes with that needs to be flushed
I made some utility functions such as ethereum/current-address and ethereum/chain-keyword that needs to be used instead of the multiple let bindings we need to do in every event that needs the current address or chain keyword
move utils.ethereum to ethereum

yenda · May 25, 2019, 9:08am

Swarm Summit Update

A short update because I am currently at the Swarm Summit and working on other things at the moment.

My meta PR got merged so a lot of the improvements described in this thread are now part of the develop branch. The main visible effect being the real-time feedback on transactions, which are now added as soon as they are on chain, and the time to load for the wallet on slow devices which is some on paar with other screens.

kudos to @dmitrys who started implementing the go parts of the improvement which will make a huge impact in terms of performance and bandwidth usage of the wallet (but still negligible compared to the whisper monster) https://github.com/status-im/status-go/pull/1467

My next step is to integrate @dmitrys work once it reaches a mvp stage.

rachel · May 29, 2019, 10:43am

Well done on all the above @yenda.

It sounds like #8152—persisting tokens—is the last remaining critical issue for this scope of work.

There’s also #8153, for displaying token failures. How much work will this be to fix?

Lastly, we have this bug/regression—#8254—but I think it can be backlogged for now.

Are there any other loose ends not covered by GitHub issues?