For a long time, it was speculated the phonemes were in a completely proprietary format, especially considering the complex way they were generated and the fact that no tool could reliably extract them for several years (lierip got close but could only extract them as a series of garbled bitmaps). However, it turns out the phonemes are simply in FLIC, a format similar to GIF which the game uses for all short 2D videos/animations (long movies are Smackers instead).
FLICs default to 10 FPS, and phonemes at first appear to be no different. However if you extract the FLICs straight and try to play them back at 10 FPS, they run way too fast. Turns out the game uses metadata in the SI files to duplicate certain frames and make them run longer. This allows the game to hold certain expressions (e.g. stretching out the "O" phoneme frame when INFOMAN says "Helloooo!" in the opening) without wasting space by duplicating the frame. There was actually a FLIC way of doing this (each frame could have a "Delay" member that overrode the frame rate), however that was a nonstandard extension and it's likely LEGO Island's FLIC parser didn't support it.
Knowing this, we can finally produce clean extractions of the phonemes: