Unlike the Java Virtual Machine (JVM) or the .NET CLR, which are stack-based, the LVM is a register-based virtual machine. This distinction is crucial for decompilation.
This architecture results in fewer instructions but more complex instruction semantics, often mapping more closely to hardware CPU architectures than typical high-level VMs.
-- Source:
function max(a, b)
if a > b then return a else return b end
end
Compiled bytecode (disassembled) looks like this:
function <max.lua:1,5> (2 registers, 2 constants)
1 [2] LT 1 0 1 ; if a > b then
2 [2] JMP 1 ; to PC 4
3 [2] MOV 1 0 ; return a
4 [2] RETURN 1 2 ; return from function
5 [3] MOV 1 1 ; return b
6 [3] RETURN 1 2
A decompiler must see the LT + JMP pattern and realize: This is an if-then-else.
Let’s decompile a simple script.
Original source (hello.lua):
local function greet(name) print("Hello, " .. name) end
for i = 1, 3 do greet("user") end
Compile: luac -o hello.luac hello.lua (Lua 5.4)
Decompile using unluac:
java -jar unluac.jar hello.luac
Output (approximate):
local function greet(name) print("Hello, " .. name) end
for i = 1, 3 do greet("user") end
Notice: Variable name i survived because the compiler stored debug info. If you strip debug symbols (luac -s), the output becomes:
local function greet(var_0) print("Hello, " .. var_0) end
for var_1 = 1, 3 do greet("user") end
The logic is identical; the names are generic.
Recent research (2022–2024) has applied large language models (e.g., CodeBERT, GPT) to decompilation. For Lua, this is promising because:
Experiment: Train a transformer to map Lua bytecode tokens → source tokens. Early results show ~85% structural accuracy for short functions, but variable naming remains stochastic.
Prediction: By 2027, we may see a neural Lua decompiler that can recover meaningful variable names (e.g., renaming local a to local playerHealth using context).