SI 2.2 Format

Reverse engineering LEGO Island's primary asset packaging format
Post Reply
User avatar
MattKC
Site Admin
Posts: 323
Joined: Mon Aug 22, 2022 1:05 am
Contact:

SI 2.2 Format

Post by MattKC »

This is an ongoing post to try documenting the SI format in as much detail as possible. While technically libweaver serves as a "code" documentation, there are also plenty of uses for a searchable and readable document storing this information too.

Without further ado, let's begin:

1.1 RIFF

Interleaf files are an extension of an existing file format known as the Resource Interchange File Format or RIFF. This format was introduced in 1991 by Microsoft and IBM and is heavily used in a number of Microsoft formats including WAV, AVI, and ANI files. Google also uses it in the WebP format.

RIFF is a very simple specification based around chunks. A chunk contains a 4-byte identifier (usually a 4 character ASCII string, though this is not enforced), a 4 -byte unsigned integer for the size of the chunk, excluding the 8-bytes that comprise the identifier and size value. The remainder of the chunk (i.e. the data that the "size" value pertains to) is for the most part arbitrary and application specific.
Type Offset Size
Chunk Identifier 0x0 0x4
Chunk Size 0x4 0x4
Chunk Data 0x8 Chunk Size
Pad Byte (if data size is not even) 0x8 + Chunk Size 0x1
All chunks must follow this basic format, but otherwise the chunks can be named almost anything and contain any data (up to 4GB, which is the limit of the value stored in the unsigned 32-bit chunk size value). The chunk size must be little-endian, which is standard for x86 and most modern ARM CPUs.

RIFF chunks are byte-aligned to 2, meaning chunk data must be an even size. The chunk size value can be an odd number, but if so, a pad byte is added at the end to ensure the next chunk starts at an even offset.

The RIFF standard only defines two chunks itself: "RIFF" and "LIST". Both of these chunks are "container" chunks, designed to contain one or more subchunks:
  • The "RIFF" chunk must be the root chunk, i.e. all chunks in the file must be subchunks of a "RIFF" chunk at the beginning of the file. This makes the "RIFF" chunk size effectively the size of the whole file, minus 8 bytes for the ID and size.
  • The "LIST" chunk is very similar to the "RIFF" chunk, in that its role is to contain subchunks, however the implication is that the subchunks form a cohesive list or array. In practice, a "LIST" is basically a generic container chunk like RIFF, except that RIFF implies the start and end of a file and LIST does not.
Both of these container chunks add a third field:
Type Offset Size
Chunk Identifier 0x0 0x4
Chunk Size 0x4 0x4
Type Identifier 0x8 0x4
Subchunks 0xC Chunk Size
Pad Byte (if data size is not even) 0x8 + Chunk Size 0x1
While the "RIFF" header identified the file as following the RIFF specification, the "type identifier" identifies what kind of data is stored in the RIFF format here. Like the chunk identifier, it's usually a 4 character ASCII string, but this is not enforced. In WAVE files, the type identifier is, naturally, "WAVE". For AVIs, it's "AVI " (note the space at the end to fill out all 4 bytes).

1.2 Interleaf/SI

For the most part, Interleaf files closely follow the RIFF format, with a handful of notable exceptions that will be listed later.

Interleaf files are identified by the type identifier "OMNI". As such, the first 12 bytes of an Interleaf file are always the following:
Type Offset Size
"RIFF" 0x0 0x4
File Size - 8 0x4 0x4
"OMNI" 0x8 0x4
Interleaf files define a number of application-specific chunks, all with the prefix Mx:

1.2.1 MxHd

The header information for the Interleaf file. This is always the first chunk in the file (after the RIFF identifier/size of course). For version 2.2 Interleaf files, it is 12 bytes in size:
Type Offset Size Description
Version 0x0 0x4 Major-minor version number for Interleaf file in little-endian represented by two 16-bit values packed into one 32-bit value. For 2.2, this is 0x00020002.
Buffer Size 0x4 0x4 The buffer size to allocate in memory where data from this file will be read into. Since Interleaf files are designed to be streamed, this specifies exactly how much data to stream at any given time. In practice, no chunks can cross a "buffer size" boundary, i.e. if the buffer size is 0x10000, all chunks must end before offset 0x10000, 0x20000, 0x30000, etc. Interleaf files provide a "padding chunk" that can be used to fill the remaining space if a chunk ends early, as well as ways to split chunks across multiple boundaries where necessary.
Buffer Count 0x8 0x4 UNKNOWN: It is uncertain what this value does yet, but it can be assumed from the name (taken from source files inadvertently included on the Korean version) that it's the amount of buffers to have in memory at a time, possibly to help aid smooth transitions between buffer reads when streaming music or video.
1.2.2 MxOf

The offset table. This acts as a table of contents where each entry is the file offset of a streamable action. This structure is as follows:
Type Offset Size
Number of actions 0x0 0x4
Offsets 0x4 Chunk Size - 0xC
Not every entry in the table has to have a valid offset. Often actions are deliberately placed at certain indexes despite several entries before them being empty. Notably most dialogue in room scenes starts from entry 500 onwards.

The "number of actions" is also not equal to the number of entries in the table. If an action has subactions, these appear to get counted too. For instance, in NOCD.SI, which contains one video, this value is 3 - presumably because the video, audio, and combined "movie" are all counted.


***

To be continued...
Post Reply