Simple Subroutines for the EVM
相关视频
正文
Abstract
This proposal provides a complete, efficient, safe and static control-flow facility.
It introduces two new opcodes to support calling and returning from subroutines:
RJUMPSUB relative_offset
-- relative jump to subroutineRETURNSUB
-- return toPC
after most recentRJUMPSUB
.
It depends on the two new opcodes proposed by EIP-4200 to support static jumps:
RJUMP relative_offset
— relative jump toPC + relative_offset
RJUMPI relative_offset
— conditional relative jump
It deprecates JUMP
and JUMPI
, allowing valid code to support streaming, one-pass, and other near-linear compilers.
In concert with EIP-3540 and EIP-3670 it ensures, at initialization time, that valid code will not execute invalid instructions or jump to invalid locations, will not underflow stack, will maintain consistent numbers of inputs and outputs for subroutines, and will have bounded stack height in the absence of recursion.
This is among the simplest possible proposals that meets these requirements.
Motivation
A complete control-flow facility.
Jumps, conditional jumps and subroutines were proposed by Alan Turing in 1945 as a means of organizing the logic of the code and the design of the memory crystals for his Automatic Computing Engine:
We wish to be able to arrange that sequences of orders can divide at various points, continuing in different ways according to the outcome of the calculations to date... We also wish to be able to arrange for the splitting up of operations into subsidiary operations... To start on a subsidiary operation we need only make a note of where we left off the major operation and then apply the first instruction of the subsidiary. When the subsidiary is over we look up the note and continue with the major operation.
— Alan Turing — in B.E. Carpenter, R.W. Doran, "The other Turing machine." The Computer Journal, Volume 20, Issue 3, January 1977.
In more contemporary terms, we have sequences of instructions, jumps and conditional jumps that divide sequences into blocks, subroutine calls, and a stack of addresses to return to. The details vary, but similar facilities have proven their value across a long line of important machines over the last 75 years, including most all of the machines we have programmed or implemented -- physical machines including the Burroughs 5000, CDC 7600, IBM 360, DEC PDP-11 and VAX, Motorola 68000, Sun SPARC, and Intel x86s, as well as virtual machines for Scheme, Forth, Pascal, Java, Wasm, and others.
Unlike these machines, the Ethereum Virtual Machine does not provide subroutine operations. Instead, they must be synthesized using the dynamic JUMP
instruction, which takes its destination on the stack. Further, the EVM provides only dynamic jumps, impeding the static analysis we need.
Efficient control-flow.
Efficient to write by hand, compile from high level labguages, validate at deploy time, interpret by VMs, and compile to machine code.
Static jumps, conditional jumps, and subroutines are sufficient and efficient in space and time, as shown by historical experience and as we will show for the EVM below.
Safe control-flow.
The EVM has unusually high requirements for safety. Not only do many smart contracts control inordinately large amounts of valuable Ether, but once placed on the blockchain any defects are visible to attackers and cannot be repaired. We propose to statically validate important safety constraints on code at initialization time.
Static control-flow.
The EVM's dynamic jumps cause two major problems. First, the need to synthesize static jumps and subroutines with dynamic jumps wastes space and gas with needlessly complex code, as we will show below.
Worse, jumps that can dynamically branch to any destination in the code can cause quadratic "path explosions" when traversing the flow of control. For Ethereum, this is a denial-of-service vulnerability that prevents us, at initialization time, from validating the safe use of EVM code and from compiling EVM code to machine code.
We need static control-flow to validate program safety and to compile EVM bytecode to machine code -- in time and space linear in the size of the code.
Specification
Opcodes
RJUMPSUB (0x5f) relative_offset
Transfers control to a subroutine.
- Decode the
relative_offset
from the immediate data atPC
. - Push the current
PC + 3
to thereturn stack
. - Set
PC
toPC + relative_offset
.
The relative_offset
is relative to the current PC
. The offset is encoded as a two-byte, twos-complement signed integer, stored MSB-first.
The gas cost is low.
RETURNSUB (0x5e)
Returns control to the caller of a subroutine.
- Pop the
return stack
toPC
.
The gas cost is verylow.
Notes:
- Values popped off the
return stack
do not need to be validated, since they are alterable only byRJUMPSUB
andRETURNSUB
. - The description above lays out the semantics of these instructions in terms of a
return stack
. But the actual state of thereturn stack
is not observable by EVM code or consensus-critical to the protocol. (For example, a node implementer may codeRJUMPSUB
to unobservably pushPC
on thereturn stack
rather thanPC + 1
, which is allowed so long asRETURNSUB
observably returns control to thePC + 3
location.)
Validity
Execution is defined in the Yellow Paper as a sequence of changes in the EVM state. The conditions on valid code are preserved by state changes. At runtime, if execution of an instruction would violate a condition the execution is in an exceptional halting state. The Yellow Paper defines six such states.
- Insufficient gas
- More than 1024 stack items
- State modification during a static call
- Insufficient stack items
- Invalid jump destination
- Invalid instruction
We would like to consider EVM code valid iff no execution of the program can lead to an exceptional halting state. In practice, we must test at runtime for the first three conditions. We don’t know how much gas there will be, we don’t know how deep a recursion may go, analysis of stack depth even for non-recursive programs is nontrivial, and we don't know whether a call will be static. All of the remaining conditions MUST be validated statically, in time and space quasi-linear in the size of the code.
Static Constraints on Valid Code
- Every instruction MUST be valid:
- The
JUMP
andJUMPI
instructions ARE NOT valid.
- The
- Every jump MUST be valid:
- The
RJUMP
,RJUMPI
, orRJUMPSUB
instructions MUST NOT address immediate data or addresses outside of their code section.
- The
- The stacks MUST be valid:
- The number of items on the
data stack
MUST always be positive. - The number of items on the
return stack
MUST always be positive.
- The number of items on the
- The data stack MUST be consistently aligned:
- The data stack height is
- the absolute difference between the current
stack pointer
and thestack pointer
on entry to the current subroutine.
- the absolute difference between the current
- It MUST be the same for every reachable path through a given
PC
and - MUST NOT exceed 1024.
- The data stack height is
Rationale
This is a purely semantic specification, placing no constraints on the syntax of code sections beyond being a sequence of opcodes and immediate data – a subroutine is not a contiguous sequence of bytecode, it is a subgraph of the bytecode's control-flow graph. The EVM is a simple state machine. We only promise that valid code will not, as it were, jam up the gears of the machine.
By avoiding syntactic constraints we allow for optimizations like tail call elimination, multiple-entry subroutines, moving "cold" code out of line, and other ways of reducing redundancy and keeping "hot" code in cache. Since we wish to support one-pass compilation of EVM code to machine code it is crucial that the EVM code be as well optimized as possible up front.
Validation
Rather than enforce constraints via syntax, we enforce them via validation.
The constraints on valid code cover all of the exceptional halting states that we can validate and allow code to be validated and compiled in time and space quasi-linear in the size of the code.
The RJUMP
, RJUMPI
and RJUMPSUB
instructions take their relative_offset as immediate arguments, which cannot change at runtime. Having constant destinations for all jumps means that all jump destinations can be validated at initialization time, not runtime. Dynamic jumps can branch to any destination in the code, so exploitable quadratic "path explosions" are possible when traversing the control flow graph. Deprecating JUMP
and JUMPI
prevents this.
Requiring a consistently aligned data stack
- prevents stack underflow
- ensures that all calls to a subroutine have the same number of inputs and the same number of outputs and
- ensures that stack height is bounded in the absence of recursion.
Requiring a consistently aligned data stack
also allows some algorithms that traverse the control-flow graph -- including code validation and compilation -- to break cycles at joins, again preventing quadratic path explosion. When a traversal gets to a PC
it has visited before it is either at the beginning of a loop or the entry to a function. Since the stack height at that PC
is constant we know that loops will not grow stack, and that the number of arguments to a subroutine will always be the same -- there may be no need to traverse that path again.
Note: The JVM and Wasm enforce similar constraints for similar reasons.
Alternative Designs
There are a few major designs for a subroutine facility, two of which are considered here. The others are mostly not appropriate for the EVM, such as the Wheeler Jump -- self-modifying code that writes return addresses into called subroutines.
1. Keep return addresses on a dedicated return stack. Turing's design is often used by stack machines, including those for Forth, Java, Wasm, and others. The data stack is used for computation, with a dedicated stack for return addresses. A single instruction suffices to call, and another to return from a routine.
2. Keep return addresses on the data stack. This design is often used by register machines, including those from CDC, IBM, DEC, Intel, and ARM. The registers are used primarily for computation, and the stack maintains call frames for return addresses, arguments, and local variables. On the EVM there are no registers for computation, so using the stack for both purposes can cause the sort of inefficiencies we see below. Pascal p-code does use this design, but as part of a complex calling convention with dedicated registers.
We prefer the dedicated return stack.
- It maintains a clear separation between calculation and flow of control:
- the data stack is free of vulnerable return addresses and
- it's impossible to overwrite the return stack.
- It improves efficiency:
- it uses native arithmetic rather than 256-bit EVM instructions for the return address,
- doesn't use up a
data stack
slot for the return address and - needs less motion of 256-bit data on the stack.
Efficiency
We illustrate here how subroutine instructions can be used to reduce the complexity and gas costs of both ordinary and optimized subroutine calls compared to using JUMP
.
Simple Subroutine Call
Consider these examples of a fairly minimal subroutine, including the code to call it.
Subroutine call, using RJUMPSUB
:
SQUARE:
dup1 ; 3 gas
mul ; 5 gas
returnsub ; 3 gas
CALL_SQUARE:
push 0x02 ; 3 gas
rjumpsub SQUARE ; 5 gas
Total gas: 19
Subroutine call, using JUMP
:
SQUARE:
jumpdest ; 1 gas
swap1 ; 3 gas
dup1 ; 3 gas
mul ; 5 gas
swap1 ; 3 gas
jump ; 8 gas
CALL_SQUARE:
jumpdest ; 1 gas
push 0x02 ; 3 gas
push RTN_CALL: ; 3 gas
push SQUARE ; 3 gas
jump ; 8 gas
RTN_CALL:
jumpdest ; 1 gas
Total: 41 gas.
Using RJUMPSUB
versus JUMP
saves 41 - 19 = 22 gas — a 54% improvement.
Tail Call Optimization
Of course in cases like this one we can optimize the tail call, so that the return from SQUARE
actually returns from TEST_SQUARE
.
Tail call optimization, using RJUMPSUB
and RETURNSUB
:
dup1 ; 3 gas mul ; 5 gas returnsub ; 3 gas CALL_SQUARE: push 0x02 ; 3 gas rjump SQUARE ; 3 gas
Total: 17 gas
Tail call optimization, using JUMP
:
SQUARE:
jumpdest ; 1 gas
swap1 ; 3 gas
dup1 ; 3 gas
mul ; 5 gas
swap2 ; 3 gas
jump ; 8 gas
CALL_SQUARE:
jumpdest ; 1 gas
push 0x02 ; 3 gas
push SQUARE ; 3 gas
jump ; 8 gas
Total: 38 gas
Using RJUMPSUB
versus JUMP
saves 38 - 17 = 21 gas — a 55% improvement.
Efficiency Caveats
We can see that these instructions provide a simpler and more gas-efficient subroutine mechanism than using JUMP
— in our examples they cut gas use by about half.
Clearly, the benefits of this efficiency are greater for programs that have been factored into smaller subroutines. How small? Wrapping code in a subroutine costs only 8 gas using RJUMPSUB
and RETURNSUB
versus 30 gas using JUMP
, PUSH
and SWAP
as above.
Costs
The low cost of RJUMPSUB
versus the mid cost of JUMP
is justified by needing only to decode the immediate two byte destination to the PC
and push the return address on the return stack
, all using native arithmetic, versus using the data stack with emulated 256-bit instructions.
The verylow cost of RETURNSUB
is justified by needing only to pop the return stack
into the PC
. Benchmarking will be needed to tell if the costs are well-balanced.
Backwards Compatibility
These changes affect the semantics of existing EVM code: bytes that would have been interpreted as valid jump destinations may now be interpreted as immediate data. Since this proposal depends on the Ethereum Object Format to signal the change this is not a practical issue.
Test Cases
Simple routine
This should jump into a subroutine, back out and stop.
Bytecode: 0x5f0003005e
(RJUMPSUB 3, RETURNSUB, STOP
)
Pc | Op | Cost | Stack | RStack |
---|---|---|---|---|
0 | RJUMPSUB | 5 | [] | [] |
2 | STOP | 0 | [] | [] |
3 | RETURNSUB | 3 | [] | [] |
Output: 0x
Consumed gas: 10
Two levels of subroutines
This should execute fine, going into one two depths of subroutines
Bytecode: 0x5f00045F00025200
(RJUMPSUB 4, RJUMPSUB 2, RETURNSUB, RETURNSUB, STOP
)
Pc | Op | Cost | Stack | RStack |
---|---|---|---|---|
0 | RJUMPSUB | 5 | [] | [] |
3 | RJUMPSUB | 5 | [] | [] |
4 | RETURNSUB | 5 | [] | [] |
5 | RETURNSUB | 5 | [] | [] |
6 | STOP | 0 | [] | [] |
Consumed gas: 20
Failure 1: invalid jump
This should fail, since the given location is outside of the code-range.
Bytecode: 0X5fff
(RJUMPSUB -1
)
Pc | Op | Cost | Stack | RStack |
---|---|---|---|---|
0 | RJUMPSUB | 10 | [] | [] |
Error: at pc=0, op=RJUMPSUB: invalid jump destination
Failure 2: shallow return stack
This should fail at first opcode, due to shallow return_stack
Bytecode: 0x5e
(RETURNSUB
)
Pc | Op | Cost | Stack | RStack |
---|---|---|---|---|
0 | RETURNSUB | 5 | [] | [] |
Error: at pc=0, op=RETURNSUB: invalid retsub
Subroutine at end of code
In this example the RJUMPSUB is on the last byte of code. When the subroutine returns, it should hit the 'virtual stop' after the bytecode, and not exit with error
Bytecode: 0x5c00045e5fffff
(RJUMP 4, RETURNSUB, RJUMPSUB -1
)
Pc | Op | Cost | Stack | RStack |
---|---|---|---|---|
0 | RJUMP | 5 | [] | [] |
3 | RETURNSUB | 5 | [] | [] |
4 | RJUMPSUB | 5 | [] | [] |
7 | STOP | 0 | [] | [] |
Consumed gas: 15
Reference Implementation
The following is a pseudo-Python implementation of an algorithm for predicating code validity. An equivalent algorithm must be run at initialization time.
This algorithm performs a symbolic execution of the program that recursively traverses the code, emulating its control flow and stack use and checking for violations of the rules above.
It runs in time equal to O(vertices + edges)
in the program's control-flow graph, where edges represent control flow and the vertices represent basic blocks — thus the algorithm takes time proportional to the size of the code. It maintains a stack of continuations for conditional jumps, the size of which is at most proportional to the size of the code.
Validation Function
** Note: this function is known to be incomplete and incorrect. **
For simplicity's sake we assume that all jumpdest analysis and prior validation has been done, including EIP-3540, EIP-3670, and EIP-4200, so EOF headers and sections are well-formed, and there are no invalid instructions or jumps. In practice, all passes of validation can be folded into a single loop no recursion.
We also assume some helper functions.
is_valid(opcode)
returns true iff opcode is valid.is_terminator(opcode)
returns true iff opcode is terminator.is_valid_jumpdest(pc)
returns true iffpc
is a valid jump destination.immediate_data(pc)
returns the immediate data for the instruction atpc
.immediate_size(opcode)
returns the size of the immediate data for an opcode.removed_items(opcode)
returns the number of items removed from thedata_stack
by theopcode
.added_items(opcode)
returns the number of items added to thedata_stack
by theopcode
.
# returns true iff code is valid
def validate_code(code: bytes, pc: int, sp: int, bp: int) -> boolean:
continuations = []
do
while pc < len(code):
opcode = code[pc]
if !is_valid(opcode):
return false
if is_terminator(opcode):
return true
# check stack height and return if we have been here before
stack_height = sp - bp
if stack_height > 1024
return false
if pos in stack_heights:
if stack_height != stack_heights[pos]:
return false
return true
else:
stack_heights[pos] = stack_height
if opcode == RJUMP:
# reset pc to destination of jump
jumpdest = immediate_data(pc)
pc += jumpdest
if !is_valid_jumpdest(pc)
return false
elif opcode == RJUMPI:
jumpdest = pc + immediate_data(pc)
if !is_valid_jumpdest(pc)
return false
# continue true side of conditional later
continations.push((jumpdest, sp, bp))
# continue false side of conditional now
elif opcode == RJUMPSUB:
# will enter subroutine at destination
bp = sp
# push return address and reset pc to destination
jumpdest = pc + immediate_data(pc)
if !is_valid_jumpdest(pc)
return false
push(return_stack, pc + 3)
pc = jumpdest
continue
elif opcode == RETURNSUB:
# will return to subroutine at destination
bp = sp
# pop return address and check for preceding call
pc = pop(return_stack)
if code[pc - 3] != RJUMPSUB:
return false
# apply instructions to stack
sp -= removed_items(opcode)
if sp < 0
return false
sp += added_items(opcode)
# Skip opcode and immediate data
pc += 1 + immediate_size(opcode)
while (pc, sp, bp) = continuations.pop()
return true
Security Considerations
These changes introduce new flow control instructions. They do not introduce any new security considerations. This EIP is intended to improve security by validating a higher level of safety for EVM code deployed on the blockchain. The validation algorithm must be quasi-linear in time and space to not be a denial of service vulnerability. The algorithm here makes one linear-time pass of the bytecode, and uses a stack of continuations that cannot exceed the number of RJUMPI
instructions in the code.
Copyright
Copyright and related rights waived via CC0.