Under the hood of Ethereum Virtual Machine: Solidity basics

Ethereum

The words ‘cryptocurrency’ and ‘blockchain’ have been in recent times mentioned increasingly often in the news. It leads to an inflow of many people interested in thsee technologies and, therefore, a great number of new products. Quite often, to implement the internal logic of the project or to collect funds, they use ‘smart contracts’, specific programmes created on the Ethereum platform and existing inside its blockchain. The Web is full of material on creation of simple smart contracts and basic principles but there is practically no description of how the Ethereum virtual machine (hereinafter referred to as the EVM) functions on a lower level. This is why in this article I would like to analyse the way the EVM operates in more detail.

Solidity, the language created to develop smart contracts, is relatively recent: its development only started in 2014 and, as a consequence, it is sometimes ‘under-cooked’. In this article, we will start with a more general description of the EVM functions and some distinctive features of Solidity which are necessary to understand a lower-level work.

P.s. The article presupposes a certain level of basic knowledge on creating smart contracts and on the Ethereum blockchain as a whole, so if you hear about this for the first time in your life, we recommend you to explore the basics, for instance, here:

 

Table of Contents

  1. Memory
  • Storage
  • Memory
  • Stack
  1. Data location of complex types
  2. Transactions and message calls
  3. Visibility
  4. Links

Memory types

Before delving into the subtleties of the work of EVM, we should understand one of the most important points: where and how all the data is preserved. It is extremely important because the memory areas in the case of the EVM are organised in a very different way and, therefore, not just the price of reading/recording data but also the ways to work with them can be different.

Storage

The Storage is the first and the most expensive memory type. Every contract has its own storage memory that preserves all the state variables that do not change between the function calls. The Storage can be compared to a hard drive: once the code is implemented, everything will be recorded in the blockchain, and when the contract is called for the next time, we will have access to all the previously obtained data.

contract Test {
  // this variable is stored in storage
  uint some_data; // has default value for uint type (0)
  
  function set(uint arg1) {
      some_data = arg1; // some_data value was changed and saved in global
  }
}

Structurally, the Storage is a storing area of the type key-value where all the cells are 32 bytes large, strongly reminiscent of hash tables. As a result, this memory is quite rarefied and we would not get any advantage from storing data in two neighbouring cells: the storing of one variable in the cell #1 and of another one in the cell #1,000 would cost as much gas as if we stored them in the cells #1 and #2.

[32 bytes][32 bytes][32 bytes]...

As I have already said, this type of memory is the most expensive one: taking a new Storage cell costs 20,000 gas, modifying an occupied cell costs 5,000 gas, reading a cell costs 200 gas. Why is it so expensive? The reason is obvious: the data preserved in the contract storage will be recorded in the blockchain and will stay there forever.

It is also quite easy to calculate the entire volume of information that can be preserved in a contract: the number of cells is 2^256, and each cell is 32 bytes large, so a total of 2^261 bytes can be stored! In reality, it is a kind of Turing machine, allowing for recursive call/jumps and practically unlimited memory. It is more than enough to simulate one more Ethereum inside that would simulate one more Ethereum 🙂

Memory

The second type of memory is Memory. It is much cheaper than Storage, it is cleared between the external function calls (you can read about different function types in the following chapters) and used to store temporary data: for instance, function arguments, local variables and storing return values. It can be compared with RAM: when the computer (in our case the EVM) is switched off, it is erased.

contract Test {
  ...
  function (uint a, uint b) returns (uint) {
    // a and b are stored in memory
    uint c = a + b
    // c has been written to memory too
    return c
  }
}

Structurally, Memory is a byte array. Initially, its size amounts to zero, but it can be expanded by 32-byte portions. Unlike Storage, Memory is contiguous and therefore well-packed: it is much cheaper to store an array with length 2 preserving 2 variables than an array with length 1,000 preserving the same variables at the ends and zeroes in the middle.
Reading and recording of one digital word (I remind you that within the EVM it amounts to 256 bit) costs only 3 gas. As for the memory extension, it becomes more expensive depending on current size. The storing of several Kbytes will be cheap enough but 1Mbyte will already cost millions of gas because the price grows quadratically.

// fee for expanding memory to SZ
TOTALFEE(SZ) = SZ * 3 + floor(SZ**2 / 512)
// if we need to expand memory from x to y, it would be
// TOTALFEE(y) - TOTALFEE(x)

 

Stack

As the EVM is stack-based, it is hardly surprising that the last memory area is the Stack: it is used to conduct all the EVM calculations, and costs as much to use as the Memory. Its maximum size is 1024 elements of 256 bits each but only the upper 16 elements are available for use. Of course, the elements of the Stack can be moved to the Memory or Storage but an arbitrary access to the Stack is impossible without a prior removal of its top. If the Stack is overflowed, the contract implementation would stop, so I advise you to leave the whole Stack work to the compiler 😉

Data location of complex types

Within Solidity, the work with ‘complicated’ types such as structures and arrays that can exceed 256 bit should be organised more carefully. As their copying can be quite expensive, we should think where to store them: in the Memory (which is not permanent) or in the Storage (where all the state variables are kept). For this end, there is an additional parametre within Solidity for arrays and structures, ‘data location’. Depending on the context, this parametre has always a standard value but it can changed by the storage and memory keywords. The standard value is memory for function arguments and storage for local variables (it is still memory for simple types) and always storage for state variables.

There is also a third data location, calldata. The data stored there is unchangeable and the work with this data is organised similarly to the Memory. The external function arguments are always stored in calldata.

The data location is also important because it influences the work of the assignment operator: assignments between variables in the Storage and Memory always create an independent copy while the assignment to a local Storage variable will only create a reference that would lead to a state variable. Nor will the assignment of the memory – memory type create a copy.

contract C {
    uint[] x; // the data location of x is storage

    // the data location of memoryArray is memory
    function f(uint[] memoryArray) {
        x = memoryArray; // works, copies the whole array to storage
      
        // var is just a shortcut, that allows us automatically detect a type
        // you can replace it with uint[]
        var y = x; // works, assigns a pointer, data location of y is storage
        y[7]; // fine, returns the 8th element of x
        y.length = 2; // fine, modifies x through y
        delete x; // fine, clears the array, also modifies y
      
        uint[3] memory tmpArr = [1, 2, 3]; // tmpArr is located in memory
        var z = tmpArr; // works, assigns a pointer, data location of z is memory
      
        // The following does not work; it would need to create a new temporary /
        // unnamed array in storage, but storage is "statically" allocated:
        y = memoryArray;
      
        // This does not work either, since it would "reset" the pointer, but there
        // is no sensible location it could point to.
        delete y;
      
        g(x); // calls g, handing over a reference to x
        h(x); // calls h and creates an independent, temporary copy of x in memory
        h(tmpArr) // calls h, handing over a reference to tmpArr
    }

    function g(uint[] storage storageArray) internal {}
    function h(uint[] memoryArray) internal {}
}


Transactions and message calls

Ethereum has two types of accounts that share the same address space: External accounts, or ordinary accounts controlled by pairs of private and public keys (to put it in simpler words, accounts of humans) and contract accounts controlled by the code stored together with the account (smart contracts). A transaction is a message from one account to another one (that can be the same account or a specific zero-account, see below), containing a data of some kind (payload) and ether.

Everything is clear about the transactions between ordinary accounts: they only transfer value. When the target account is a zero-account (with the address 0), the transaction creates a new contract, forming its address from the address of the sender and his number of sent transactions (‘nonce’ of the account). The payload of such a transaction is interpreted by the EVM as a bytecode and executed, and its output is permanently stored as the code of the contract.

If the target account is the contract account, the code it contains is executed and the payload is provided as input data. Contract accounts cannot send transactions by themselves but can launch them as a response to the transactions received (both from external and contract accounts). In this way, the interaction of contracts with each other is enabled by internal transactions (message calls). Message calls are identical to the ordinary transactions, in that they also have a source, a target, ether, gas etc., and can have a gas limit set by the contract. The only difference with transactions created by the ordinary accounts is that the message calls can only exist within the Ethereum runtime environment.

Visibility

Solidity has four types of ‘visibility’ of functions and variables: external, public, internal and private, the public being standard. In the case of the state variables internal visibility is standard and external is impossible. So, let us examine all the options:

  • External : the functions of this type make part of the contract interface, meaning they can be called from other contracts by means of a message call. The called contract will receive a blank copy of Memory and an access to the payload data that would be located in a separate section, calldata. Once the implementation is complete, the returning data will be placed in the Memory, in a location preselected by the calling contract. The external function cannot be directly initiated from inside the contract (meaning we cannot use  func () , but such call as  this.func()     is still possible). In the case when there is a lot of data received, such functions can be more efficient than public ones (I will write about it below).
  • Internal : both the functions and the state variables of this type can only be used inside the contract itself as well as inside the contracts derived from this one. Unlike the external functions, they do not use message calls but work by code ‘jumps’ (JUMP Instruction). Thanks to this, when such function is called, the Memory is not erased, making it possible to send complicated types stored in the Memory with the help of a reference (remember the example from the Data location chapter, tmpArr is sent to the function h with the help of a reference.
  • Public : the public functions are universal: they can be called both from outside (being a part of the contract interface) and from inside the contract. A special getter function is generated for public global variables, having external visibility and restoring the value of the variable.
  • Private : the private functions and variables are in no way different from the internal ones, with the exception of not being visible in derived contracts.

By way of illustration, let us see a small example.

contract C {
    uint private data;

    function f(uint a) private returns(uint b) { return a + 1; }
    function setData(uint a) { data = a; } // default to public
    function getData() public returns(uint) { return data; }
    function compute(uint a, uint b) internal returns (uint) { return a+b; }
}


contract D {
    uint local;
    
    function readData() {
        C c = new C();
        uint local = c.f(7); // error: member "f" is not visible
        c.setData(3);
        local = c.getData();
        local = c.compute(3, 5); // error: member "compute" is not visible
    }
}


contract E is C {
    function g() {
        C c = new C();
        uint val = compute(3, 5);  // acces to internal member (from derivated to parent contract)
        uint tmp = f(8); // error: member "f" is not visible in derived contracts
    }
}

One of the most frequent questions is “why do we need external functions if we can always use public ones?” In reality, there is no such case when external functions cannot be traded for public ones, however, as I have already written, in some cases it is more efficient. Let us consider the following concrete example:

contract Test {
    function test(uint[3] a) public returns (uint) {
       // a is copied to memory
         return a[2]*2;
    }

    function test2(uint[3] a) external returns (uint) {
         // a is located in calldata
         return a[2]*2;
    }
}

The implementation of a public function costs 413 gas while the call of an external function is only 281 gas. This difference exists because in the case of the public function the array is copied in the Memory while within the external function it is read directly from the calldata. Without any doubt, allocation of memory is more expensive than reading from calldata.

The public functions have to copy all the arguments to the memory because they can also be called from the inside of the contract which is a completely different process: as I have written above, such calls work by the code jumps, and the arrays are transferred through pointers at memory. So, when the compiler generates a code for an internal function, it expects to see the arguments in the Memory.

For external functions, the compiler does not need internal access, so it enables the data reading directly from calldata, bypassing the step of copying into memory.

We can see that an informed choice of the ‘visibility’ type is needed not only to limit the access to the functions but also to use them in a more efficient way.

Links

Close Menu