Here is an example IJVM binary, click on the bytes
During your first week you'll have to write a program that parses these files in the most basic way.
To make this process somewhat easier, here's a visual guide for an example program. It's spelled out as much as possible. Press the bytes to view their role.
Note:The entire file is in BIG-ENDIAN, while your computers work in LITTLE-ENDIAN. TL;DR Endianness is about byte ordering. Whether [0x01, 0x00, 0x00, 0x00] should be 1 or 0x01000000, big endian means the first byte is the biggest, little endian means the first byte is the smallest. [see]
Every file starts off with a small file header, especially for binaries these are often used to specify size/position of the parts of the binary.
The IJVM header is rather simple, containing no real meta data, like ELF-binaries do.
The first 4 bytes of the file are the constant value 1DEADFAD
,
This is to identify the file as being an IJVM executable.
You'll find that all sorts of files, like pictures, start with a
constant number of bytes.
This custom is called MIME typing, and is used as both a way to identify the exact filetype, and to ensure you're not trying to run a file of the wrong type.
Note:If you want to avoid a bunch of headaches, check this value before continuing, or you might end up in a rather weird debug session for trying to execute a jas file.
This is where the IJVM stores all of it's constants, similar to read-only memory. The idea is to store large numbers/very distinct numbers here constants here.
The first 4 bytes of the constant pool block represent the place
the constant pool has to be "mapped in", into memory. In this case
that's the address 0x00010000
.
Binaries need to know where to find the constant they want,
which for most assembly languages is given in an absolute address.
E.g. 0x00010004
, the computer will then directly read
from there.
IJVM instructions that deal with constants actually work with an offset from the start of the constant pool, rather than absolute addresses. As such you don't really need to take this number into account.
This is simply the number of bytes in this block
In this case, this is: 0xc or 12 in decimal
The actual constants, 32 bit signed integers (two's complement), since these are 4-bytes a piece, there are 12/4 = 3 constants here, these are:
0xffffffff = -1 0x00000002 = 2 0x00000003 = 3
Text is the section in the binary with the actual machine code, the translated assembly.
This is really just the same story as the constant pool's origin
Note:For the debugger bonus, you set a breakpoint on the virtual address, an absolute value, which you'll need to have read here.
IJVM assembly is variable sized, some instructions take arguments and take up more space than others. As such this number is not the same as the amount of instructions total.
This will be the code your program will execute, we've written out the instructions that these bytes represent with their opcodes below:
.main BIPUSH 0x70 // 10 70 DUP // 59 BIPUSH 0xff // 10 ff IADD // 60 DUP // 59 DUP // 59 BIPUSH 0xff // 10 ff ISUB // 64 OUT // fd OUT // fd OUT // fd OUT // fd .end-main