Wallet improvement log

In a previous post, I proposed an approach to improve the transaction list in the wallet.

TLDR: we are making progress, in this first post I propose a bunch of additional optimizations based on experience gained from implementation, and an algorithm to solve the fetching transaction history problem.

The good

We started the work last week and here are the parts that are already done:

The next step is to fetch transactions once at login from etherscan and then check in each new block if there is some eth transaction to get rid of the last wallet loop.

The bad

  • Transaction history for regular ethereum transactions is not a solved problem and the only solutions I could find were either to use a third party service like etherscan or iterate over each block looking for transactions.

  • eth_newBlockFilter returns only the hash of the last block, which requires an extra call to get the actual block

  • @igor : possible optimization would be to make a subscription that returns the last block number instead of hash. Since this is also the only way to get last ethereum transactions we could also make a subscription that only signals transactions to and from and address.

  • Token transaction history can be fetched through eth logs but the rpc call can timeout if the range is too high so it needs to be done iteratively.

The ugly

We have two problems to solve:

  • fetching the complete token transaction history (currently limited to 2 weeks, not available from etherescan api, though they have an ethlog equivalent that could be checked out)
  • fetching ethereum transaction history without etherscan

I propose the following algorithm composed of 2 steps to solve both issues

Find all ethereum transactions without etherescan

This part is optional, we could use it only when etherescan is down, for instance during chaos unicorn days.

The JSON RPC API contains the following calls:

Unless I am mistaken, we should be able to find all inbound and outbound eth transactions from block 0 with a recursive dichotomic search that would be at most log2(last-block) (roughly 23) levels deep.

  • check the number of transactions and balance at current block (block x)
  • check the number of transactions and balance at last checked block (block y) (genesis for the first run)

Algorithm: input balance diff and transaction count diff:

  • if balance and number of transactions have not changed: there was no transactions, we stop
  • if balance has increased but number of transactions have not changed: we are looking for inbound transactions (count transactions is only for outbound transactions), we check balance at block (x+y)/2 and repeat the algorithm for both [x (x+y)/2] and [(x+y)/2 y] ranges with transaction count diff 0.
  • if both have changed: we are looking for outbound transactions (even 0 eth transaction cost gas), we check balance at block (x+y)/2 and repeat the algorithm for both [x (x+y)/2] and [(x+y)/2 y] ranges with transaction count diff 0

Everytime we hit a range of 1 block we use eth_getBlockByNumber to get the block and retrieve the transactions from and to the account address.

The implementation of this part of the algorithm could be bountied, and eventually implemented in status-go entirely for efficiency.

Find all token transactions

Infura has some difficulties with huge ethlog ranges which is why this algorithm tries to minimize the amount of calls required to find all token transactions.

  • Some of the transactions found by the first step are outbound token transactions. It can be determined by fetching their transaction receipts.

  • At this point all we are missing are the token inbound transactions. For those we are first going to check the balance of the shown tokens.

From there we have 2 options:

BalanceOf

  • We can then use a similar approach as for regular transaction with the balanceOf method of the token contract, which takes an address parameter. eth_call takes an additional block parameter.
  • For each token transfered we do a dichotomic search using balance of to find the first inbound transfer between block 0 and block(first transfer)
  • For each outgoing token transfer, we check the balanceOf at the block the transfer happens for the token transfered. If balanceOf at block(transfer n) is different from balanceOf at block(transfer n-1) + amount transfered, we do a dichotomic search between the 2 blocks (or call ethlog if it is estimated faster, less than 100 000 blocks)

Ethlogs

  • We will then use that balance and the known outbound transactions to find the ranges in which the inbound transactions have occurred.
  • We will then scan these ranges using ethlogs. After this step only the inbound transactions that occurred before the first outbound transactions for each token will be missing.
  • For those we will call ethlogs iteratively going backward from the block of the first known outbound transaction back to the genesis block, except we stop once the balance has reached 0.
2 Likes

awesome to see yet another log, love it! keep it coming.
All the ideas here seem well reasoned and donā€™t have too much to add.

iirc there has been some other efforts in terms of collecting transaction history around ultra light clients, but cannot remember the details.

There was also this project that is attempting to do this in a decentralized way, but I donā€™t think itā€™s appropriate for us at this point in time.

Progress so far

The final touches of https://github.com/status-im/status-react/pull/8221 are being worked on:

  • the PR is following up on https://github.com/status-im/status-react/pull/8184 which introduced subscriptions to ethlog filters to add new token transfers as they are added to a new block on chain,
  • it gets rid of the wallet loop entirely, and only fetches token transfers and transactions when login in
  • it gets all of the token transfer history and not just the last 100 000 blocks

Todo:

  • fix tests
  • handle case of missing blocks between new block and last current block (when app is in background for instance)

@igor any idea what happens to subscriptions if status-go is offline for a while? Is it catching up and sending signals when back online? (Iā€™ll investigate tomorrow)

Next step

On status-react side:

On status-go side:
As described in OP the ethereum JSON RPC API doesnā€™t offer an efficient way to fetch transaction history, and a POC of the proposed algorithm will be quite inefficient unless some better endpoints are provided by status-go.

@yenda internally subscriptions just poll filters, so as soon as connection is restored, filters should be able to be filled and you should get new data.

Consider using eth.events as fallback, their postgre db is open to remote direct connections and you can just copy needed data locally. I use them for a Uniswap project. The idea would be to download all transactions related to the current userā€™s account from them in a single query, store those on the phone, and use them to present tx data. Then when the user requests a refresh or periodically, refetch set from last fetched block. Works like a charm and is no big deal considering there are other, slower means of getting the same data if eth.events fails. Convenience with a persistence fallback.

That is what we currently do with etherscan API. The next goal is not to have more centralized fallback but to be able to fetch transaction history without third parties (or at most an rpc gateway).

We do better than that, we get new transactions as new blocks are added to the chain directly from the geth node (or rpc).

Right, but Iā€™m advocating for copying the data here, not querying it when needed. So query it all in a single statement when the address is detected, and store in the local storage. Because this data is structured and parsed and they result of a single call as opposed to several with other providers, suddenly you donā€™t need etherscan or eth.events for past information and you couldnā€™t care less about a geth node disappearing (which cannot show you past transactions anyway unless itā€™s an archive node, which you will never decentralize).

The data is also stuctured with etherscan and itā€™s one API call for transactions and one for token transfers. Persisting them so that the whole history is only requested once is also part of the plan.

4 Likes

Progress so far

All wallet related PRs that have been peer reviewed have been merged in a meta PR for QA:

The meta PR will also include the following PRs once they are reviewed

As well as the PRs that will implement the changes discussed in What's next section.

The combination of all these PRs brings the following improvements:

  • complete ERC20 token transfers history
  • live updates of the transaction and transfers, as soon as they are added to the chain
  • much faster wallet initialization after the first run, removal of a lot of unnecessary network consumption
  • subscriptions and views for transaction history have been cleaned up and optimized a bit, some code reorg for wallet was done and there is still quite some to go because all of the wallet code was in the ui module.

Network consumption will be improved even further once persistence is added, because there is a lot of pending blocks returned by the currentBlock subscription and it requires re-fetching the complete history from etherscan to stay consistent (which was done every 20 sec previously anyway)

Whatā€™s next

  • finish cleaning up wallet module and replace web3 by status-go
  • check status of custom erc20 token implementation with @andrey because there is lots of potential for optimization there as well but I donā€™t want to clash with pending changes
  • persistence of transaction

Persistence of transaction

Working on the wallet code before implementing the persistence gave me a lot of insight and I think I got to the point where I have the safest solution for persistence while minimizing bandwidth consumption and computations.

The idea is to only persist confirmed transactions (more than 12 blocks old).
Every new transaction is added to unconfirmed-transactions map
Everytime there is a new block event:

  • we save last-persisted-block which is current-block - 12, and
  • we check if we have unconfirmed-transactions
  • if we do we persist those who are more than 12 blocks old, after checking them again with their hash.
  • Whenever we skipped a block or get a redundant block number, we flush the unconfirmed transactions and fetch the last 12 blocks again.

This allows us:

  • to avoid persisting transaction that where included in a block that turned out to not be in the main chain
  • to heal transaction history when this happens
  • to cache confirmed transactions and avoid having to recompute anything about them when new transactions arrive

Unknown and custom ERC20 tokens

With the current state of the new wallet implementation ERC20 transfers from unknown contracts are marked as ERC20 tokens with 18 decimals, when the history is fetched from etherscan they might get their actual name.
The idea would be to fetch the data from the contract when a transfer from an unknown token is found.

4 Likes

Oops I did it again

I made an important refactoring of the wallet over the weekend, focused on removing web3 usage and using a generic json-rpc/call and json-rpc/eth-call methods. These can be used directly in the code which will help making it more obvious where we can optimize instead of being hidden behind X layers of code as it was before.

Here is an exemple with the inbound-token-transfer-handler. As you can see it is quite a ride, thatā€™s because we need 3 levels of callback to get all the informations we want for a transaction. Later on, this kind of functions will be optimized by gathering this information in one rpc method on status-go side, so that we only need one call for it.

This one is actually a handler called upon reception of a signal from eth_newFilter subscription and Iā€™ll present a better alternative in the next point.

(defn inbound-token-transfer-handler
  "The handler gets a list of inbound token transfer events and parses each
   transfer. Transfers are grouped by block the following chain of callbacks
   follows:
   - get block by hash is called to get the `timestamp` of each block
   - get transaction by hash is called on each transaction to get the `gasPrice`
   `gas` used, `input` data and `nonce` of each transaction
   - get transaction receipt is used to get the `gasUsed`
   - finally everything is merged into one map that is dispatched in a
   `ethereum.signal/new-transaction` event for each transfer"
  [chain-tokens]
  (fn [transfers]
    (let [transfers-by-block
          (group-by :block-hash
                    (keep #(parse-token-transfer
                            chain-tokens
                            :inbound
                            %)
                          transfers))]
      ;; TODO: remove this callback chain by implementing a better status-go api
      ;; This function takes the map of supported tokens as params and returns a
      ;; handler for token transfer events
      (doseq [[block-hash block-transfers] transfers-by-block]
        (json-rpc/call
         {:method "eth_getBlockByHash"
          :params [block-hash]
          :on-success
          (fn [{:keys [timestamp number]}]
            (let [timestamp (str (* timestamp 1000))]
              (doseq [{:keys [hash] :as transfer} block-transfers]
                (json-rpc/call
                 {:method "eth_getTransactionByHash"
                  :params [hash]
                  :on-success
                  (fn [{:keys [gasPrice gas input nonce]}]
                    (json-rpc/call
                     {:method "eth_getTransactionReceipt"
                      :params [hash]
                      :on-sucess
                      (fn [{:keys [gasUsed]}]
                        (re-frame/dispatch
                         [:ethereum.transactions/new
                          (-> transfer
                              (dissoc :block-hash)
                              (assoc :timestamp timestamp
                                     :block     (str number)
                                     :gas-used  (str (decode/uint gasUsed))
                                     :gas-price (str (decode/uint gasPrice))
                                     :gas-limit (str (decode/uint gas))
                                     :data      input
                                     :nonce     (str (decode/uint nonce))))]))}))}))))})))))

Learning by rewriting

eth_call hasnā€™t got much love in the code base for a while, initialy it had a poorā€™s man implementation of parameters conversion which prevented us from using it with more complex contract method calls.

We then introduced the abi-spec namespace, which based on the signature of the method can encode all the params accordingly https://github.com/status-im/status-react/blob/develop/src/status_im/utils/ethereum/abi_spec.cljs (I will move it into the ethereum.abi-spec namespace soon btw). Later on we added decoding based on a list of output params types as well.
But this wasnā€™t used much and a lot of our codebase, mostly in the wallet, kept using the legacy call-params or worst, web3. Now this is mostly gone, at least for the wallet part with the new json-rpc/call and json-rpc/eth-call methods.

As an interesting side effect, I discovered that all previous implementations of an eth_call helper function were passing latest as a second parameter, a map containing the address of the contract and the data being the first one. If I understand correctly, this means we can call any contract method at any block height

Maybe some limitations apply, tbd, for instance what is returned if you go for a block when the contract didnā€™t exist yet? If that is the case we could also use that to find the creation block of our known token contracts to limit how far in history we have to look back for each token.

This was my missing link for a better version of the no-etherscan algorithm to find all transactions. So I fixed it:

Better status-go signals

@igor @dmitrys
For the wallet the ideal solution would be that status-go automatically starts sending signals after user logs in for:

  • new block: should contain the block number, and if for some reason we got on the wrong chain for a few block, a rewind flag with hash of transactions to remove from history
  • new transaction: should contain timestamp and number from block, merged info from eth_getTransactionByHash and eth_getTransactionReceipt
  • new token transfer: same as new transaction

On status-react side I use eth_filter to find inbound erc20 transfers and I use eth_getBlockByHash on each new block with true second parameter to get the list of transactions, which I filter for transactions with user address as :to or :from parameter, which captures regular eth transactions as well as outbound erc20 transfers (but not inbound because that field is encoded in the data which I donā€™t want to decode for each block transaction)

3 Likes

TLDR

Yesterday was mostly focused on rebasing the series on custom tokens, testing and fixing bugs on my PRs.

I now consider the first stone of the wallet improvement series ready for testing. A summary and test description is available in the PR: DO NOT SQUASH Meta wallet transaction history by yenda Ā· Pull Request #8228 Ā· status-im/status-mobile Ā· GitHub

This first iteration was focusing on code and performance improvements before adding persistence. While it initially targeted the transaction history only, I ended up rewriting a big chunk of the wallet and ethereum transaction related code.

This effort has made the codebase more accessible and facilitate future wallet and ethereum related improvements.

Next

There is already visible improvements in terms of performance (especially on slow devices) and bandwidth consumption in the wallet, but persistence is still the ultimate goal here. So the final step is to implement the algorithm described here Status.app

Before that there is still a bit of refactoring I want to deal with that shouldnā€™t take more than a day:

  • the way tokens are represented in memory is suboptimal, they are in a list which requires going through it every single time we look for one, this needs to be changed to a map and the crazy code that goes with that needs to be flushed
  • I made some utility functions such as ethereum/current-address and ethereum/chain-keyword that needs to be used instead of the multiple let bindings we need to do in every event that needs the current address or chain keyword
  • move utils.ethereum to ethereum
3 Likes

Swarm Summit Update

A short update because I am currently at the Swarm Summit and working on other things at the moment.

My meta PR got merged so a lot of the improvements described in this thread are now part of the develop branch. The main visible effect being the real-time feedback on transactions, which are now added as soon as they are on chain, and the time to load for the wallet on slow devices which is some on paar with other screens.

kudos to @dmitrys who started implementing the go parts of the improvement which will make a huge impact in terms of performance and bandwidth usage of the wallet (but still negligible compared to the whisper monster) https://github.com/status-im/status-go/pull/1467

My next step is to integrate @dmitrys work once it reaches a mvp stage.

5 Likes

Well done on all the above @yenda. :clap:t3:

It sounds like #8152ā€”persisting tokensā€”is the last remaining critical issue for this scope of work.

Thereā€™s also #8153, for displaying token failures. How much work will this be to fix?

Lastly, we have this bug/regressionā€”#8254ā€”but I think it can be backlogged for now.

Are there any other loose ends not covered by GitHub issues?