WRC20 and Nim - the eWASM token challenge

arnetheduck · April 11, 2019, 8:47pm

eWASM is a flavor of WebAssembly that’s being proposed to replace the EVM as execution engine in ETH2, and possibly also in Eth1.x. Basically, it’s a virtual machine like the JVM but with some properties that make it attractive to use in low-trust scenarios like web browsers, most of which support it by now.

The general flow to get from source code to running is to compile programs using a compiler, then run the code in a WebAssembly interpreter - your browser can most likely do it, and there are standalone environments as well like Wasmer.

The compiler translates the code from the source language, for example rust, C or Nim to WASM VM instructions, optimizing along the way. The runtime then translates the WASM VM instructions to the CPU instructions of the machine it’s executing on, allowing the same WASM code to run on any / most hardware, fairly efficiently.

While developing eWASM, a challenge was posted to create some smart contract code that would run on eWASM, but instead of being written in the EVM-specific languages like Solidity or Vyper, any language could be used that supported WASM. rust is on the forefront here thanks to Mozilla and Parity, but support for other languages is steadily growing.

From our brave Nimbus team, @yuriy stepped up and wrote a Nim version of the challenge back when it was posted - Nim compiles to C, and by using a specially compiled version of clang we could get support for eWASM going.

I’ve been playing around with a compiler for Nim that’s based on LLVM, and with them adding a complete WASM tool chain recently, I went ahead and tried it on the WRC20 contract - results are promising, for a first pass In a single step, we can get from Nim to some fairly compact WASM!

Let’s have a look at how it works.

First of all, you need a copy of the nim-eth-contracts repo, and nlvm itself - it’s linux only, but if someone is interested in learning about compilers, I’m happy to mentor a port to Mac/Windows/ARM etc.

# Clone repo
git clone https://github.com/status-im/nim-eth-contracts.git
cd nim-eth-contracts/examples
# Remove build config that's used for the special clang version
rm config.nims
# Grab latest nlvm
curl  -L https://github.com/arnetheduck/nlvm/releases/download/continuous/nlvm-x86_64.AppImage -o nlvm; chmod +x nlvm

Next up, we compile the nim code to WASM - we need to add a few flags - turn off garbage collection, make sure we run the optimized version and allow symbols that come from the eWASM runtime environment to remain undefined at link time:

# compile to 32-bit wasm
./nlvm c -d:release --nlvm.target=wasm32 --gc:none -l:--no-entry -l:--allow-undefined wrc20

# Is it there??
[arnetheduck@tempus examples]$ ls -l wrc20.wasm wrc20.nim
-rw-rw-r--. 1 arnetheduck arnetheduck 1537 11 apr 13.09 wrc20.nim
-rwxrwxr-x. 1 arnetheduck arnetheduck 2097 11 apr 13.49 wrc20.wasm
# Yay!

# optionally, convert to text format: wasm2wat wrc20.wasm > wrc20.wat

There’s an online tool to convert binary wasm files to their text representation, or you can get a converter from wabt. Let’s have a look at a few pieces - Wasm code is divided into modules, and we start with a few type definitions that we’ll use later - full code also available:

(module
  (type (;0;) (func (result i32)))
  (type (;1;) (func (param i32 i32)))
  (type (;2;) (func (param i32 i32 i32)))
  (type (;3;) (func (param i32)))
  (type (;4;) (func))
  (type (;5;) (func (param i32 i32) (result i32)))

Next up, we have imports - these are functions that the runtime environment proves, so that the wasm code that interact with the outside world. eWASM - the ethereum flavor of WASM, specifies what this environment should look like.

  (import "env" "getCallDataSize" (func $getCallDataSize (type 0)))
  (import "env" "revert" (func $revert (type 1)))
  (import "env" "callDataCopy" (func $callDataCopy (type 2)))
  (import "env" "finish" (func $finish (type 1)))
  (import "env" "storageLoad" (func $storageLoad (type 1)))
  (import "env" "getCaller" (func $getCaller (type 3)))
  (import "env" "storageStore" (func $storageStore (type 1)))

Finally, we have the code itself. The WASM vm is a fairly simple, stack-based machine, but I’m no WASM expert, so I’m mostly guessing what’s going on . This is the main function that selects which operation to perform - we see here some calls to the external interface, and calls into some of the functions we defined - first in Nim:

proc main() {.exportwasm.} =
  if getCallDataSize() < 4:
    revert(nil, 0)
  var selector: uint32
  callDataCopy(selector, 0)
  case selector
  of 0x9993021a'u32:
    do_balance()
  of 0x5d359fbd'u32:
    do_transfer()
  else:
    revert(nil, 0)

… and the corresponding WASM code:

  (func $main.1 (type 4)
    (local i32 i32)
    global.get 0
    i32.const 16
    i32.sub
    local.tee 0
    global.set 0
    block  ;; label = @1
      block  ;; label = @2
        call $getCallDataSize
        i32.const 3
        i32.le_s
        br_if 0 (;@2;)
        local.get 0
        i32.const 0
        i32.store offset=12
        local.get 0
        i32.const 12
        i32.add
        call $callDataCopy_2O1SXmzMBKQV9cWGnElsimg
        block  ;; label = @3
          local.get 0
          i32.load offset=12
          local.tee 1
          i32.const 1563795389
          i32.ne
          br_if 0 (;@3;)
          call $do_transfer_E82kPpU5OcEfVOGiDsEd5g_2
          local.get 0
          i32.const 16
          i32.add
          global.set 0
          return
        end
        local.get 1
        i32.const -1718418918
        i32.ne
        br_if 1 (;@1;)
        call $do_balance_E82kPpU5OcEfVOGiDsEd5g
        unreachable
      end
      i32.const 0
      i32.const 0
      call $revert
      unreachable
    end
    i32.const 0
    i32.const 0
    call $revert
    unreachable)

Turns out the WebAssembly folks are not lying - it looks stack-based indeed - operations like add and le_s (compare) generally lack operands - they’re popped a stack with the result being pushed back.

In generating WASM code, nlvm will first generate LLVM IR, which is a similar, but slightly more high-level representation of the same code. Notable differences include the LLVM IR being target-dependent (compiling the same Nim code for x86_64 would look different) and register-based:

# Add -c to produce LLVM IR:
./nlvm c -d:release --nlvm.target=wasm32 --gc:none -l:--no-entry -l:--allow-undefined -c wrc20

define void @main.1() local_unnamed_addr #1 {
secAlloca:
  %selector = alloca i32, align 4
  %call.res.wrc20.53.20 = tail call i32 @getCallDataSize()
  %icmp.IntSLT.wrc20.53.23 = icmp slt i32 %call.res.wrc20.53.20, 4
  br i1 %icmp.IntSLT.wrc20.53.23, label %if.true.wrc20.53.2, label %if.end.wrc20.53.2

if.true.wrc20.53.2:                               ; preds = %secAlloca
  tail call void @revert(i8* null, i32 0)
  unreachable

if.end.wrc20.53.2:                                ; preds = %secAlloca
  store i32 0, i32* %selector, align 4
  call fastcc void @callDataCopy_2O1SXmzMBKQV9cWGnElsimg(i32* nonnull %selector)
  %load.selector = load i32, i32* %selector, align 4
  switch i32 %load.selector, label %case.else.do.wrc20.57.2 [
    i32 -1718418918, label %case.of.1.do.wrc20.57.2
    i32 1563795389, label %secReturn
  ]

case.of.1.do.wrc20.57.2:                          ; preds = %if.end.wrc20.53.2
  call fastcc void @do_balance_E82kPpU5OcEfVOGiDsEd5g()
  unreachable

case.else.do.wrc20.57.2:                          ; preds = %if.end.wrc20.53.2
  call void @revert(i8* null, i32 0)
  unreachable

secReturn:                                        ; preds = %if.end.wrc20.53.2
  call fastcc void @do_transfer_E82kPpU5OcEfVOGiDsEd5g_2()
  ret void
}

In register-based VM’s operations take arguments in the form of registers or memory locations - like the icmp.

We can see that the optimizer has made a pass over the code already (-d:release flag) - cases are reordered and simplified a little - one of the advantages of using WASM is that we can reuse the tooling that’s developed for WASM, including compilers, debuggers etc.

Of course, the support above is very bare-bones and primitive - there are parts missing and more optimizations could be done. During the Eth2 meetup, Vitalik for example raised a concern that WASM bytecode might be less compact than EVM.

Presently, nlvm leaves some cruft around that could be removed to produce a smaller file but take the numbers with a grain of salt - this is not production ready by any means . A simple optimization is to remove some of the debugging information that normally gets added:

[arnetheduck@tempus examples]$ nlvm c -d:release --nlvm.target=wasm32 --gc:none -l:--no-entry -l:--allow-undefined -d:clang -l:--strip-all  wrc20.nim 
[arnetheduck@tempus examples]$ ls -l wrc20.wasm 
-rwxrwxr-x. 1 arnetheduck arnetheduck 1593 11 apr 14.35 wrc20.wasm
# Yay, 25% less!

We’ll see how it goes, but WASM has plenty of things going for it right now - WASM engines are popping up everywhere - in browsers (Nim/WASM web example), phones - and will likely make their way into embedded systems also.

Nim on the blockchain next? Who knows …

julien · April 12, 2019, 8:15am

Nice! Have you seen the result of my buidl week project? Related to Wasm.

julien · April 12, 2019, 8:16am

@arnetheduck Next challenge for you: how far are we from compiling nimbus to Wasm?

jacqueswww · April 12, 2019, 9:24am

Very cool @arnetheduck!

@julien the problem of running an eth client in the browser is really the P2P protocol needs be run over WebRTC. Which as far I know, has only been partially solved.

jacqueswww · April 12, 2019, 9:27am

Interesting discussion relating to porting libp2p to webrtc, Support for WebRTC transport · Issue #188 · libp2p/go-libp2p · GitHub

julien · April 12, 2019, 9:30am

Yeah I was more thinking running inside status actually. AFAIK ipfs/libp2p have nice JS ports.

jacqueswww · April 12, 2019, 10:51am

Well inside status one would just use RPC right? No reason to do all the difficult wasm stuff

julien · April 12, 2019, 11:00am

That’s an easy way to dynamically load chain. Plus nimbus is eth2

arnetheduck · April 12, 2019, 2:59pm

we could actually do this with nim today, just like you’ve done it with rust - exporting simple functions like that already works

poemm · April 19, 2019, 8:27pm

Here is how your wrc20.wasm compares with others in size.
wrc20_handwritten.wasm - 570 bytes
wrc20_AssemblyScript.wasm - 700 bytes
wrc20_C.wasm - 1 kB
wrc20_Nim.wasm - 1.3kB
wrc20_Rust.wasm - 1.7kB

I know that this is a first try and you haven’t had a chance to tune and experiment yet, so I expect that it may be like the C version in size.

I tried to pass tests with wrc20_Nim.wasm. But there are problems with my test infrastructure which I don’t want to debug now since I have a backlog of other work. So I hereby delay this test debugging until I am pinged to try again.

poemm · April 20, 2019, 1:23am

I kept thinking about it, went back to work, and finally got it to pass the four tests at the bottom of Axic’s WRC20 spec. The WRC20 spec does not specify how to internally store addresses or balances in storage, so I found out that wrc20_Nim.wasm stores them in the following address:balance format: 0xeD09375DC6B20050d242d1611af97eE4A6E93CAd000000000000000000000000:0x00000000000f4240000000000000000000000000000000000000000000000000. There was also some awkwardness with endianess, which you guys solved most of.

Below is the code that is confirmed to pass tests. The size of the final wasm is 1204 bytes, which may be reducible, but I don’t have time. There is interest to compare sizes against erc20.evm, is there any interest in extending this nim example to full erc20 spec?

## ewasm “WRC20” token contract coding challenge
## https://gist.github.com/axic/16158c5c88fbc7b1d09dfa8c658bc363

import ../eth_contracts, endians

proc do_balance() =
  if getCallDataSize() != 24:
    revert(nil, 0)

  var address{.noinit.}: array[32, byte]
  callDataCopy(addr address, 4, 20)

  var balance{.noinit.}: array[32, byte]
  storageLoad(address, addr balance)
  finish(addr balance, 8)

proc do_transfer() =
  if getCallDataSize() != 32:
    revert(nil, 0)

  var sender: array[32, byte]
  getCaller(addr sender)
  var recipient: array[32, byte]
  callDataCopy(addr recipient, 4, 20)
  var value: array[8, byte]
  callDataCopy(value, 24)

  var senderBalance: array[32, byte]
  storageLoad(sender, addr senderBalance)
  var recipientBalance: array[32, byte]
  storageLoad(recipient, addr recipientBalance)

  var sb, rb, v: uint64

  bigEndian64(addr v, addr value)
  bigEndian64(addr sb, addr senderBalance[0])

  if sb < v:
    revert(nil, 0)

  bigEndian64(addr rb, addr recipientBalance[0])

  sb -= v
  rb += v # TODO there's an overflow possible here..

  bigEndian64(addr senderBalance[0], addr sb)
  bigEndian64(addr recipientBalance[0], addr rb)

  storageStore(sender, addr senderBalance)
  storageStore(recipient, addr recipientBalance)

proc main() {.exportwasm.} =
  if getCallDataSize() < 4:
    revert(nil, 0)
  var selector: uint32
  callDataCopy(selector, 0)
  case selector
  of 0x1a029399'u32:
    do_balance()
  of 0xbd9f355d'u32:
    do_transfer()
  else:
    revert(nil, 0)

## Original code from: https://github.com/status-im/nim-eth-contracts/blob/master/examples/wrc20.nim

## Copyright (c) 2018 Status Research & Development GmbH

## Licensed and distributed under either of

##    MIT license: LICENSE-MIT or http://opensource.org/licenses/MIT

## or

##    Apache License, Version 2.0, (LICENSE-APACHEv2 or ##http://www.apache.org/licenses/LICENSE-2.0)

## at your option. This file may not be copied, modified, or distributed except according to those terms.

Edit: Forgot to mention I manually changed import names from env to ethereum and removed all exports except for memory and main. One subtlety, here is how export main was changed:

Before:
  (export "main" (func $main))
  (export "main.1" (func $main.1))
After:
  (export "main" (func $main.1))

For C, I automate this post-processing with a script which uses pywebassembly. To post-process Nim with this script, something would have to change because wrc20_Nim.wasm currently exports main as function labelled $main and not $main.1 which is the real main. So either I change my post-processing script or you tune the compiler output.

arnetheduck · April 22, 2019, 4:17am

Cool! Thanks for the fixes - I took the liberty to push them to git, along with a few low-hanging simplifications and size-reducers.

Turns out llvm aggressively inlines bswap but fails to recognize that WASM has no bswap instruction causing a bit of damage along an few other minor things that could be improved.

I also fixed the nlvm to avoid generating the duplicate main function - make sure to download a fresh copy before moving on.

I also played around a bit with the available tooling, notably binaryen - instructions below assume you have it in your PATH.

Long story short, a native no-sweat compile of wrc20 now lands at 846 bytes - we can strip it further to 755 using some standard tooling - pretty us close to AssemblyScript . Where can I find that code btw? I found @lrettig’s version but I’m not sure where to go from there - looks like it’s doing 32-bit balances, and I can’t find where it does the byte-swapping. Is the compiled version available somewhere?

Anyway, without further ado:

# Grab latest nlvm
curl  -L https://github.com/arnetheduck/nlvm/releases/download/continuous/nlvm-x86_64.AppImage -o nlvm; chmod +x nlvm
# Note the new flags:
# --noMain removes the pesky main symbol
# --compress-relocations shaves a few bytes for free - wonder what else it does or why it's not enabled by default?
# we're down to 846 bytes!
./nlvm c -d:release --nlvm.target=wasm32 --gc:none -l:--compress-relocations -l:--no-entry -l:--allow-undefined -l:--strip-all --noMain wrc20

# let's strip out what poemm mentioned, in a less sophisticated way :)
# first, convert to text
wasm2wat --generate-names wrc20.wasm > wrc20.wat

# remove some cruft, change env->ethereum
# * nim_program_result is easy to fix in nlvm, the others a bit harder - need llvm patches
sed -e '/nim_program_result\|__heap_base\|__data_end\|funcref\|"memory"/d' -e 's/env/ethereum/g' wrc20.wat > wrc20tmp.wat
wat2wasm -o wrc20strip.wasm wrc20tmp.wat
wasm2wat --generate-names wrc20strip.wasm > wrc20strip.wat
# Pretty cool - we're down to 799 bytes!

After this, the next step is a wasm-to-wasm optimizer. We can shave another few bytes off there by running the binaryen size optimizer - looks like it complements llvm nicely by doing the inlining I wanted llvm to do but couldn’t because of bswap - perfect!

wasm-opt -Os -o wrc20binaryen.wasm wrc20strip.wasm
wasm2wat --generate-names wrc20binaryen.wasm > wrc20binaryen.wat

With that, we land at 755 bytes by my count - happy to hear about other ways to easily strip it

That would be very cool to see actually, though not quickly without help. I did note that @yuriy exported a fair bit of the ETH interface - I didn’t have time to dive into it really, but noted some oddities around data type sizes such as u128 that I wasn’t sure what to make of - we have a nice fixed-size int library to use that does lots of compile-time tricks to keep things nimble if that’s needed. The nim version of wrc20 uses 64-bit balances for calculations to make it more easy to compare with other versions but that seems like it could maybe overflow in ETH1, no? Are there any implementations already?

poemm · April 22, 2019, 11:21am

It is very exciting that you size-optimized wrc20_Nim.wasm so far! When I get a chance, I will look for more size optimizations in wrc20_Nim.wasm. I will also test it at the smaller size to make sure that the behaviour did not change. On my todo list.

Just thinking about size, I am working on general tools in pywebassembly to automate ewasm-specific size optimizations. For example, import names like “ethereum” “storageStore” can be represented with just one character. I am also considering compression, but for network transfer, we may want to use uncompressed Wasm format because it allows streaming instantiation.

Where can I find that code btw? I found @lrettig’s version but I’m not sure where to go from there - looks like it’s doing 32-bit balances, and I can’t find where it does the byte-swapping. Is the compiled version available somewhere?

Yes, that is it. The compiled .wat is in the build dir of that repo. He claims to be passing the tests. I have not tried it, but was also wondering about the endianness too. On my todo list to investigate this, but this is now a lower priority since now we have more competitive options.

but noted some oddities around data type sizes such as u128 that I wasn’t sure what to make of - we have a nice fixed-size int library to use that does lots of compile-time tricks to keep things nimble if that’s needed.

Thanks for the tip. This is one of the problems that I have. I currently can’t handle uint128_t and this is the reason my ecrecover_from_libsecp256k1.wasm doesn’t pass tests. (Also relevant to u128, I spent the weekend trying to speed-optimize mul256.wasm (multiplication modulo 2^256, just like EVM MUL), and got it down to 28 multiplications, four u64xu64->u64 and the rest u32xu32->u64, not sure whether I can get it lower. At least now, when compiled, it competes with Geth and Parity’s EVM opcode.)

The nim version of wrc20 uses 64-bit balances for calculations to make it more easy to compare with other versions but that seems like it could maybe overflow in ETH1, no?

Good point. I saw the comment in the Nim code. If the total number of tokens is under 2^64, then no overflow.

Are there any implementations already?

Someone might be working on it in Rust. If I were to implement it, I would do it in C. But C is unaesthetic with all of the pointers, and I am afraid that it will scare away contract developers, so maybe Nim is better suited for Ewasm.

BTW, this weekend, after looking at Nim, I turned my attention to wrc20_C.wasm and also independently noticed the aggressive inlining done by LLVM, and got wrc20_C.wasm down to 685 bytes. (This is not uploaded yet, will be part of a big update for the whole C ewasm toolchain including a 60-byte malloc, but still need to test.). I have not yet tried wasm-opt -Os, but have been meaning to try. Although the C version is now slightly smaller than Nim, I expect that the Nim and C versions will help each other and end up at the same size. We will approach the handwritten size together.

I will respond here when I revisit wrc20_Nim.wasm, hopefully within a week.

jacqueswww · May 9, 2019, 10:30am

Hi @poemm !

How are you currently testing the WRC20 implementation? I am currently in need of of a test suite for these nim/wasm contracts, and would love some type of way to test the wasm I generate.

poemm · May 13, 2019, 12:28am

@jacqueswww You motivated me to publish my ewasm testing toolchain to here. I use testeth and hera.

I tried the updated version of the Nim wrc20. One obvious bug: I changed the curl command to not remove the memory export since current Ewasm requires a memory export. In my testing toolchain (in the test directory), I did the following:

cp path/to/nim/wrc20.wasm .
make fill_wrc20 PROJECT=wrc20 WRC20_FOOTER=wrc20_tester/footer2.txt

For the version of wrc20.nim which I posted above, this command shows that it passes. For the updated wrc20.nim version, it does not pass. It is likely something small. Hopefully I will have time next weekend to return to this. It may be difficult for anyone other than me to do this debugging work because I know the tricks related to testeth, hera, and wasm which allow me to troubleshoot.

FYI Ewasm design is still evolving. Hopefully wrc20 will only need tiny input/output changes to meet the final Ewasm specification.

jacqueswww · May 13, 2019, 8:37am

@poemm Great thanks, will check it out

arnetheduck · June 30, 2019, 3:55pm

I was mucking around with nlvm and stressing it with wrc20 - results are pretty nice - down to 659 bytes

Found a bug as well which might be the failure that you were seeing in the test, @poemm - the wrong balance variable was being updated.

Changes include:

selectively enabling the inliner by excluding the expensive ones explicitly
fewer pointer arguments across function calls
fewer variables “live” across EE calls meaning compiler can reuse stack space

I suspect one would have to teach the compiler a bit more about the nature of the EE functions and their parameters to squeeze more out of the optimizer - which memory locations get read, written etc.

One could also argue that some of the safeguards shouldn’t be necessary - for example, calling getCallDataSize before callDataCopy is redundant - the EE would have to revert on invalid/out-of-range accesses anyway, so no need to waste space on it here. Removing them lets us go all the way to 605 bytes!