Binary


Here is an example IJVM binary, click on the bytes

1D
EA
DF
AD
00
01
00
00
00
00
00
0C
FF
FF
FF
FF
00
00
00
02
00
00
00
03
00
00
00
00
00
00
00
0F
10
70
59
10
FF
60
59
59
10
FF
64
FD
FD
FD
FD

Explanation


During your first week you'll have to write a program that parses these files in the most basic way.

To make this process somewhat easier, here's a visual guide for an example program. It's spelled out as much as possible. Press the bytes to view their role.

Note:

The entire file is in BIG-ENDIAN, while your computers work in LITTLE-ENDIAN. TL;DR Endianness is about byte ordering. Whether [0x01, 0x00, 0x00, 0x00] should be 1 or 0x01000000, big endian means the first byte is the biggest, little endian means the first byte is the smallest. [see]

File header


Every file starts off with a small file header, especially for binaries these are often used to specify size/position of the parts of the binary.

The IJVM header is rather simple, containing no real meta data, like ELF-binaries do.

Magic bytes


The first 4 bytes of the file are the constant value 1DEADFAD, This is to identify the file as being an IJVM executable. You'll find that all sorts of files, like pictures, start with a constant number of bytes.

This custom is called MIME typing, and is used as both a way to identify the exact filetype, and to ensure you're not trying to run a file of the wrong type.

Note:

If you want to avoid a bunch of headaches, check this value before continuing, or you might end up in a rather weird debug session for trying to execute a jas file.

Constant pool


This is where the IJVM stores all of it's constants, similar to read-only memory. The idea is to store large numbers/very distinct numbers here constants here.

Constant pool origin


The first 4 bytes of the constant pool block represent the place the constant pool has to be "mapped in", into memory. In this case that's the address 0x00010000.

Binaries need to know where to find the constant they want, which for most assembly languages is given in an absolute address. E.g. 0x00010004, the computer will then directly read from there.

Note:

IJVM instructions that deal with constants actually work with an offset from the start of the constant pool, rather than absolute addresses. As such you don't really need to take this number into account.

Constant pool size


This is simply the number of bytes in this block excluding the origin and size.

In this case, this is: 0xc or 12 in decimal

Constant pool data


The actual constants, 32 bit signed integers (two's complement), since these are 4-bytes a piece, there are 12/4 = 3 constants here, these are:

   0xffffffff = -1
   0x00000002 =  2
   0x00000003 =  3

Text


Text is the section in the binary with the actual machine code, the translated assembly.

Text origin


This is really just the same story as the constant pool's origin

Note:

For the debugger bonus, you set a breakpoint on the virtual address, an absolute value, which you'll need to have read here.

Text size


IJVM assembly is variable sized, some instructions take arguments and take up more space than others. As such this number is not the same as the amount of instructions total.

Actual program


This will be the code your program will execute, we've written out the instructions that these bytes represent with their opcodes below:

 .main
   BIPUSH 0x70  // 10 70
   DUP          // 59
   BIPUSH 0xff  // 10 ff
   IADD         // 60
   DUP          // 59
   DUP          // 59
   BIPUSH 0xff  // 10 ff
   ISUB         // 64
   OUT          // fd
   OUT          // fd
   OUT          // fd
   OUT          // fd
 .end-main