EOF - Stack Validation
EIP-5450 proposes the introduction of extended validation of EOF code sections to ensure that no stack underflow or overflow can occur during the execution of validated contracts. This is achieved by verifying that no exceptional conditions can happen and preventing the execution and deployment of any invalid code. The validation provides several benefits, including removing the need for run-time stack underflow checks for all instructions, removing the need for run-time stack overflow checks for all instructions except CALLF, ensuring that execution terminates with one of the terminating instructions, and preventing the deployment of code with unreachable instructions. However, it also adds constraints to the code structure and requires a second validation pass. The proposal poses no risk to backwards compatibility as it is introduced only for EOF1 contracts, for which deploying undefined instructions is not allowed, and the new instructions are not introduced for legacy bytecode. Security considerations are still being discussed, and copyright and related rights have been waived via CC0.
Video
Original
Abstract
Introduce extended validation of EOF code sections to guarantee that neither stack underflow nor overflow can happen during execution of validated contracts.
Motivation
The current EVM performs a number of validity checks for each executed instruction, such as checking for instruction being defined, stack overflow and underflow, and enough amount of gas remaining.
This EIP minimizes the number of such checks required at run-time by verifying that no exceptional conditions can happen and preventing the execution and deployment of any invalid code.
The operand stack validation provides several benefits:
- removes the run-time stack underflow check for all instructions,
- removes the run-time stack overflow check for all instructions except
CALLF
andJUMPF
(JUMPF
introduced in a separate EIP) , - ensures that execution terminates with one of the terminating instructions,
- prevents deployment of code with unreachable instructions, thereby discouraging the use of code sections for data storage.
It also has some disadvantages:
- adds constraints to the code structure (similar to JVM, CPython bytecode, WebAssembly and others); however, these constraints can be lifted in a backward-compatible manner if they are shown to be user-unfriendly,
- it is natural to implement stack validation as a second validation pass; however, it is not strictly required and validation's computational and space complexity remains linear in any implementation variant.
The guarantees created by these validation rules also improve the feasibility of Ahead-Of-Time and Just-In-Time compilation of EVM code. Single pass transpilation passes can be safely executed with the code validation and advanced stack/register handling can be applied with the stack height validations. While not as impactful to a mainnet validator node that is bound mostly by storage state sizes, these can significantly speed up witness validation and other non-mainnet use cases.
Specification
Code validation
Remark: We rely on the notions of operand stack and type section as defined by EIP-4750.
Each code section is validated independently.
Instructions validation
In the first validation phase defined in EIP-3670 (and extended by EIP-4200 and EIP-4750) instructions are inspected independently to check if their opcodes and immediate values are valid.
Operand stack validation
In the second validation phase control-flow analysis is performed on the code.
Operand stack height here refers to the number of stack values accessible by this function, i.e. it does not take into account values of caller functions' frames (but does include this function's inputs). Note that validation procedure does not require actual operand stack implementation, but only to keep track of its height.
Terminating instructions refers to the instructions either:
- ending function execution:
RETF
,JUMPF
, or - ending call frame execution:
STOP
,RETURN
,RETURNCONTRACT
,REVERT
,INVALID
.
note: JUMPF
and RETURNCONTRACT
are introduced in separate EIPs.
Forward jump refers to any of RJUMP
/RJUMPI
/RJUMPV
instruction with relative offset greater than or equal to 0. Backwards jump refers to any of RJUMP
/RJUMPI
/RJUMPV
instruction with relative offset less than 0, including jumps to the same jump instruction.
Instructions in the code are scanned in a single linear pass over the code. For each instruction the operand stack height bounds are recorded as stack_height_min
and stack_height_max
.
The first instruction's recorded stack height bounds are initialized to be equal to the number of inputs to the function type matching the code (stack_height_min = stack_height_max = type[code_section_index].inputs
).
For each instruction:
- Check if this instruction has recorded stack height bounds. If it does not, it means it was neither referenced by previous forward jump, nor is part of sequential instruction flow, and this code fails validation.
- It is a prerequisite to validation algorithm, and code generators are required to order code basic blocks in a way that no block is referenced only by backwards jump.
- Determine the effect the instruction has on the operand stack:
- Check if the recorded stack height bounds satisfy the instruction requirements. Specifically:
- for
CALLF
instruction the recorded stack height lower bound must be at least the number of inputs of the called function according to its type defined in the type section, - for
RETF
instruction both the recorded lower and upper bound must be equal and must be exactly the number of outputs of the function matching the code, - for
JUMPF
into returning function both the recorded lower and upper bound must equal exactlytype[current_section_index].outputs + type[target_section_index].inputs - type[target_section_index].outputs
, - for
JUMPF
into non-returning function the recorded stack height lower bound must be at least the number of inputs of the target function according to its type defined in the type section, - for any other instruction the recorded stack height lower bound must be at least the number of inputs required by instruction,
- there is no additional check for terminating instructions other than
RETF
andJUMPF
, this implies that extra items left on stack at instruction ending EVM execution are allowed.
- for
- For
CALLF
andJUMPF
check for possible stack overflow: if recorded stack height upper bound is greater than1024 - types[target_section_index].max_stack_height + types[target_section_index].inputs
, validation fails. - Compute new stack height bounds after the instruction execution. Upper and lower bound are updated by the same value:
- after
CALLF
stack height bounds are adjusted by addingtypes[target_section_index].outputs - types[target_section_index].inputs
, - after any other non-terminating instruction stack height bounds are adjusted by subtracting the number of instruction inputs and adding the number of instruction outputs,
- terminating instructions do not need to update stack height bounds.
- after
- Check if the recorded stack height bounds satisfy the instruction requirements. Specifically:
- Determine the list of successor instructions that can follow the current instructions:
- The next instruction for all instructions other than terminating instructions and unconditional jump.
- All targets of a conditional or unconditional jump.
- For each successor instruction:
- Check if the instruction is present in the code (i.e. execution must not "fall off" the code).
- If the successor is reached via forwards jump or sequential flow from previous instruction:
- If the instruction does not have stack height bounds recorded (visited for the first time), record the instruction stack height bound as the value computed in 2.3.
- Otherwise instruction was already visited (by previously seen forward jump). Update this instruction's recorded stack height bounds so that they contain the bounds computed in 2.3, i.e.
target_stack_min = min(target_stack_min, current_stack_min)
andtarget_stack_max = max(target_stack_max, current_stack_max)
, where(target_stack_min, target_stack_max)
are successor bounds and(current_stack_min, current_stack_max)
are bounds computed in 2.3.
- If the successor is reached via backwards jump, check if the recorded stack height bounds equal the value computed in 2.3. Validation fails if they are not equal, i.e. we see backwards jump to a different stack height.
After all instructions are visited, determine the function maximum operand stack height:
- Compute the maximum stack height as the maximum of all recorded stack height upper bounds.
- Check if the maximum stack height does not exceed the limit of 1023 (
0x3FF
). - Check if the maximum stack height matches the corresponding code section's
max_stack_height
within the type section.
The computational and space complexity of this pass is O(len(code)). Each instruction is visited at most once.
Execution
Given the deploy-time validation guarantees, an EVM implementation is not required anymore to have run-time stack underflow nor overflow checks for each executed instruction. The exception is the CALLF
and JUMPF
performing operand stack overflow check for the entire called function.
Rationale
Properties of validated code
Any code section validated according to operand stack validation has the following properties:
- There are no unreachable instructions
- There are no instructions reachable only via backwards jump
- Operand stack underflow cannot happen.
- Operand stack overflow can only happen at
CALLF
orJUMPF
instruction. - Multiple forward jump instructions executing at different stack heights may target the same instruction; the stack of target basic block is validated for all possible heights.
- Any backwards jump instruction can only target an instruction that is executed with equal stack height; this prevents deployment of the loops with unbounded stack pushing or popping.
- Final instruction in the code section is either terminating instruction or
RJUMP
.
Stack overflow check only in CALLF/JUMPF
In this EIP, we provide a more efficient variant of the EVM where stack overflow check is performed only in CALLF
and JUMPF
instructions using the called function's max_stack_height
information. This decreases flexibility of an EVM program because max_stack_height
corresponds to the worst-case control-flow path in the function.
Unreachable code
The operand stack validation algorithm rejects any code having any unreachable instructions. This check can be performed very cheaply. It prevents deploying degenerated code. Moreover, it enables combining instruction validation and operand stack validation into single pass.
Clean stack upon termination
It is currently required that the operand stack is empty (in the current function context) after the RETF
instruction.
Otherwise, the RETF
semantic would be more complicated. For n
function outputs and s
the stack height at RETF
the EVM must erase s-n
non-top stack items and move the n
stack items to the place of erased ones. Cost of such operation may be relatively cheap but is not constant.
However, lifting the requirement and modifying the RETF
semantic as described above is backward
compatible and can be easily introduced in the future.
More restrictive stack validation
Originally another variant of stack validation was proposed, where instead of linear scan of the code section, all code paths were examined by following the target(s) of every jump instruction in a breadth-first-search manner, tracking stack height for each visited instruction and checking that for every possible code path to a particular instruction its stack height remains constant.
The advantage of this variant would be somewhat simpler algorithm (we would not need to track stack height bounds, but only a single stack height value for each instruction) and no extra requirement for ordering of code basic blocks (see below).
However compiler teams opposed to such restrictive stack height requirements. One prominent pattern used by compilers which wouldn't be possible is jumping to terminating helpers (code blocks ending with RETURN
or REVERT
) from different stack heights. This is common for example for a series of assert
statements, each one compiled to a RJUMPI
into a shared terminating helper. Enforcing constant stack requirement would mean that before jumping to such helper, extra items on the stack have to be popped, and this noticeably increases code size and consumed gas, and would defeat the purpose of extracting these common terminating sequences into a helper.
Ordering of basic blocks
The prerequisite to stack validation algorithm is ordering of code basic blocks in a way that no block is referenced only by backwards jump.
This is required to make it possible to examine each instruction in one linear pass over the code section. Forward pass over the code section allows for the algorithm to "expand" each forward jump target's stack height bounds and still keep the complexity linear. Trying to do jump target stack bounds expansion while scanning the code in the breadth-first-search manner would require to re-examine entire code path after its stack height bounds are expanded, which would result in quadratic complexity.
Backwards Compatibility
This change requires a "network upgrade," since it modifies consensus rules.
It poses no risk to backwards compatibility, as it is introduced only for EOF1 contracts, for which deploying undefined instructions is not allowed, therefore there are no existing contracts using these instructions. The new instructions are not introduced for legacy bytecode (code which is not EOF formatted).
Security Considerations
Needs discussion.
Copyright
Copyright and related rights waived via CC0.
Not miss a beat of EIPs' update?
Subscribe EIPs Fun to receive the latest updates of EIPs Good for Buidlers to follow up.
View all