What's in a GIF?

This was originally a multi-part post on cohost.

I wanted to inspect the structure of an animated GIF, so I created a 1 px × 1 px monochrome animation that switches from white to black every 1024 milliseconds in GNU IMP, exported it as an animated GIF with disposal method “combine”, and hexdumped it. Here’s the GIF below, at the center of this 1rem box so you can see it.

And here’s the output from hexdump -C:

00000000  47 49 46 38 39 61 01 00  01 00 a1 01 00 00 00 00  |GIF89a..........|
00000010  ff ff ff ff ff ff ff ff  ff 21 ff 0b 4e 45 54 53  |.........!..NETS|
00000020  43 41 50 45 32 2e 30 03  01 00 00 00 21 f9 04 05  |CAPE2.0.....!...|
00000030  66 00 00 00 2c 00 00 00  00 01 00 01 00 00 02 02  |f...,...........|
00000040  4c 01 00 21 f9 04 05 66  00 01 00 2c 00 00 00 00  |L..!...f...,....|
00000050  01 00 01 00 00 02 02 44  01 00 3b                 |.......D..;|

It says NETSCAPE2.0! That’s so cute. To inspect what the rest of it means, I’ve used the following resources to dissect it:

The table I’ve created to analyze the values and their meanings is a little long, so it’s tucked under an expandable box.

Open for hexdump dissection

Note that the values of 16-bit ints are in little endian (e.g. the delay time is encoded as `0x66 0x00`, or `0x0066`, which is 104 in decimal), but not the ASCII strings (e.g. each byte for "GIF89a" is laid out left to right).

Hex (binary) Value Meaning
47 49 46 38 39 61 GIF89a
01 00 01 00 (1, 1) Width and height in pixels
A1
(10100001)
1
010
0
001
There is a Global Colour Table (GCT)
2-bit colour channels
Unsorted GCT
Size of GCT = 21 + 1 = 4 colours
01 1 Index (0-based) of background colour in GCT
00 0 No default aspect ratio (otherwise (value + 15)/64)
00 00 00 rgb(0, 0, 0) First colour in GCT (black)
FF FF FF rgb(255, 255, 255) Second colour in GCT (white)
FF FF FF rgb(255, 255, 255) Third colour in GCT (white)
FF FF FF rgb(255, 255, 255) Fourth colour in GCT (white)
21 FF ! 0xFF Begin application extension block
0B 11 Bytes in application extension name
4E 45 54 53 43 41 50 45 32 2E 30 NETSCAPE2.0 Netscape Application Block — a looping animation!
03 3 Remaining bytes in block
01 1 Index of current data block (fixed)
00 00 0 Number of loops (0 = loops forever)
00 End of block
21 F9 ! 0xF9 Begin graphic control extension block
04 4 Bytes in block
05
(00000101)
000
001
0
1
Reserved
Don't dispose
No user input
Use transparent colour index
66 00 104 Delay time in centiseconds
00 0 Transparent colour index (black)
00 End of block
2C , Begin image descriptor block
00 00 00 00 (0, 0) Position of top-left corner
01 00 01 00 (1, 1) Width and height of frame in pixels
00 (00000000) 0
0
0
00
000
No Local Colour Table
No interlacing
Unsorted LCT
Reserved
Size of LCT
02 2 LZW code size
02 2 LZW data size in next subblock
4C 01 ? LZW encoded data
00 End of block
21 F9 ! 0xF9 Begin graphic control extension block
04 4 Bytes in block
01
(00000001)
000
001
0
1
Reserved
Don't dispose
No user input
Use transparent colour index
66 00 104 Delay time in centiseconds
01 1 Transparent colour index (white)
00 End of block
2C , Begin image descriptor block
00 00 00 00 (0, 0) Position of top-left corner
01 00 01 00 (1, 1) Width and height of frame in pixels
00 (00000000) 0
0
0
00
000
No Local Colour Table
No interlacing
Unsorted LCT
Reserved
Size of LCT
02 2 LZW code size
02 2 LZW data size in next subblock
44 01 ? LZW encoded data
00 End of block
3B ; End of image

Some interesting notes:

  • The delay time is in fact in centiseconds and not milliseconds, despite the fact that most of the image editors I know (including GNU IMP) will use milliseconds as the delay time unit in the frame name. Super annoying if you’re doing some sort of animation with millisecond-precision timing and your delays are all off…
  • The two frames have the same LZW encoded data in the image descriptor block; I think that data is actually a single transparent pixel, and its colour is determined by the transparent colour index in the preceding graphic control extension block, which is black for the first (top) layer and white for the second (bottom) layer.

The Graphic Control Extension Block

For the graphic control extension block, the spec says the following about the disposal method:

iv) Disposal Method - Indicates the way in which the graphic is to
be treated after being displayed.

Values :    0 -   No disposal specified. The decoder is
                  not required to take any action.
            1 -   Do not dispose. The graphic is to be left
                  in place.
            2 -   Restore to background color. The area used by the
                  graphic must be restored to the background color.
            3 -   Restore to previous. The decoder is required to
                  restore the area overwritten by the graphic with
                  what was there prior to rendering the graphic.
         4-7 -    To be defined.

There being four different disposal methods (none, combine, replace, ??) explains why three bits are needed. I’ve never seen the last disposal method in any image editing program, though, and from the description alone I have no idea what that behaviour means. If they had omitted that method, they could’ve gotten away with using just two bits, but I guess the reason those reserved bits exist at all is just to pad the data to a whole byte.

On the user input flag, it says:

v) User Input Flag - Indicates whether or not user input is expected before continuing. If the flag is set, processing will continue when user input is entered. The nature of the User input is determined by the application (Carriage Return, Mouse Button Click, etc.).

None of my local image viewers appear to implement this behaviour. However, I found this online GIF viewer gifiddle which is fully spec-compliant and, according to the README, indeed implements the user input flag. So I’ve taken my original single blinking pixel, set the delay to maximum, and turned on the user input flag; here it is below again.

If you go to the loaded gifiddle, (you may need to enable CORS), there’ll be a “Continue” button with the delay that advances the frame when clicked. Unfortunately, it seems that the interaction between user input and looping is unspecified (looping isn’t part of GIF89a anyway), so looping doesn’t occur here, and the animation terminates after having drawn both frames.

The Netscape Application Block

Since the NAB isn’t documented in GIF89a, I was trying to find an original source for its specification. Unfortunately, the oldest thing I could fine was Royal E. Frasier Jr.’s All About GIF89a, which does however describe the NAB pretty precisely.

byte   1       : 33 (hex 0x21) GIF Extension code
byte   2       : 255 (hex 0xFF) Application Extension Label
byte   3       : 11 (hex (0x0B) Length of Application Block 
                 (eleven bytes of data to follow)
bytes  4 to 11 : "NETSCAPE"
bytes 12 to 14 : "2.0"
byte  15       : 3 (hex 0x03) Length of Data Sub-Block 
                 (three bytes of data to follow)
byte  16       : 1 (hex 0x01)
bytes 17 to 18 : 0 to 65535, an unsigned integer in 
                 lo-hi byte format. This indicate the 
                 number of iterations the loop should 
                 be executed.
bytes 19       : 0 (hex 0x00) a Data Sub-block Terminator. 

Based on the story told in Frasier’s About this Site, it sounds like he found this directly from someone at Netscape:

Netscape had nothing on the GIF ability in their site. The Netscape programmer contacted me, and even told me about the looping mechanism in 2.0beta4. I decided to experiment and promote it through the site’s pages.

In fact, later in 1997, Netscape’s website has a page on animated GIFs that links to Frasier’s page! I guess the internet was a smaller place back then.

What’s the smallest possible GIF?

It’s the following, whose hexdump is a total of 26 bytes.

47 49 46 38 39 61 00 00 00 00 00 00 00 2C 00 00 00 00 00 00 00 00 00 00 00 3B

Dissecting it as above:

Hex Value
47 49 46 38 39 61 “GIF89a”
00 00 00 00 Size (0, 0)
00 00 00 No global colour table
No background colour index
Default aspect ratio
2C Begin image descriptor block
00 00 00 00 00 00 00 00 Position (0, 0), size (0, 0)
00 No local colour table
00 LZW code size 0
00 End of block
3B End of image

It seems like there has to be at least one image descriptor block for Firefox to even open the image. My local image viewer refuses to even open a 0 px × 0 px image so the smallest GIF there would be 1 px × 1 px, which is no fun. It further refuses to open an image with neither a global nor a local colour table, despite the fact that the spec itself says that’s perfectly valid.

The Grammar.

<GIF Data Stream>         ::= Header <Logical Screen> <Data>* Trailer

<Logical Screen>          ::= Logical Screen Descriptor [Global Color Table]

<Data>                    ::= <Graphic Block>
                            | <Special-Purpose Block>

<Graphic Block>           ::= [Graphic Control Extension] <Graphic-Rendering Block>

<Graphic-Rendering Block> ::= <Table-Based Image>
                            | Plain Text Extension

<Table-Based Image>       ::= Image Descriptor [Local Color Table] Image Data

<Special-Purpose Block>   ::= Application Extension
                            | Comment Extension

You’ll notice that <Data> is allowed 0 or more times, not 1 or more times, which means that a valid GIF, according to the spec, can consist of only a header, a logical screen descriptor, and a trailer, like so:

47 49 46 38 39 61 00 00 00 00 00 00 00 3B

The only image viewer I’ve found which reliably displays (so to speak) such a GIF is again gifiddle.

What’s the largest possible GIF?

In dimensions, it’s 65536 px × 65536 px, limited by the 16-bit integers that specify each dimension. In the GIF you specify the size of the image as a whole (the canvas, I suppose), and each frame has its own size and position on the canvas. Inside the below expandable box, I’ve included the GIF whose hexdump is the below, which has a canvas 65536 px in both dimensions, and contains a single pixel in its frame. Depending on how your browser renders this GIF, you may either see a very large blank square with nothing in it (Firefox), or it may fail to render anything at all (Chrome).

47 49 46 38 39 61 FF FF FF FF                 // header and size
80 00 00 FF FF FF FF FF FF                    // global colour table
2C 00 00 00 00 01 00 01 00 00 02 02 4C 01 00  // image descriptor block
3B                                            // end
Open for a 65536 px × 65536 px GIF at your own risk

In size by bytes, it’s technically unlimited, since there’s nothing restricting how many frames you can add. Most browsers like to add minimum delay of 1 centisecond to prevent an infinitely-looping GIF from sucking all resources, so with “merely” 2^16 = 65536 frames it’ll still take 10 minutes to load entirely.

A natural question to ask is then: What’s the size of a GIF with a 65536 px × 65536 px frame? I believe the data will have to match the size for the frame to be valid, so there has to be data for all 4 294 967 296 pixels. Supposing all pixels are black, and our global colour table has two colours only, how many bytes does the data take?

I’ve been referencing What’s in a GIF to figure this out, as well as this rather helpful StackOverflow answer. I won’t attempt to reëxplain everything here, so this won’t make much sense unless you’ve first figured out how the encoding works. Jumping right in, the minimum code size is 2, yielding 22 colour entries + 2 control codes, so the initial code table looks like this:

Code Value
#0 black
#1 white
#2
#3
#4 clear
#5 end

In brief, the way LZW compression works is it searches for the previous value appended with the next pixel in the code table, and if it doesn’t exist, a new entry for it is created, and that code is output instead. Since our input is a long sequence of black pixels, the table continues on like so:

Code Value
#6 black black
#7 black black black
#4095 black (× 4091)

The code table ends at #4095 because codes are limited to 12 bits, or 212-1. The code sequence so far then looks like this:

#4 #0 #6 #7 ... #4094 #4095

This encodes 1 + 2 + … + 4090 + 4091 pixels, which is 8 370 186 pixels. After this, the clear code is sent, and the process repeats again. Since we want 4 294 967 296 pixels, the number of rows of the full sequence we need is 513, with 1 061 878 pixels left over. Solving for n in ½n(n+1) = 1 061 878 and rounding down we get n = 1456, with 1 061 878 - ½(1456 * 1457) = 1182 pixels left over. So the full image can be represented in terms of codes as:

#4 #0 #6 #7 ... #4094 #4095
... (× 513 rows total)
#4 #0 #6 #7 ... #1459 #1460 #1182 #5

Now how is this turned into bytes? For each row, the code size starts out as 3 bits. Whenever all bits are filled, the number of bits increases by one. So when we first encounter #7, which is 0b111, the code size becomes 4 bits, which accommodates #8, or 0b1000. The first row in a sequence of binary bits is then:

100 000 110 111 1000 ... 111111111110 111111111111

However, this is encoded in little endian, so to speak, so we append from leftward, dividing into bytes. Here’s the first few bytes of the first row:

... #8   #7  #6  #0  #4
... 1000 111 110 000 100

... 10001111 10000100
... 0x8F     0x84

Of course, all of the rows are appended together, so the bytes aren’t necessarily neatly divided into rows. However, we can count the number of bits used in each row: there are four 3-bit codes, then 23 4-bit codes, and so on, all the way up to 211 12-bit codes, for a total of 45 052 bits (thanks WolframAlpha!). The final row only goes up to #1460, or 0b101_1011_0100, so that’s summing up to 29 10-bit codes, followed by 0b1_1011_0100 + 2 = 438 11-bit codes, for a total of 14 030 bits. Then the total number of bits of data is 45 052 × 513 + 14 030 bits = 23 125 706 bits, or 2 890 714 bytes, or 2.75 MiB.

A dimensionally-maximal GIF can fit under 3 MB! However, actually constructing the GIF is another thing: the whole bit-appending little endian byte-encoding process looks like a big pain, and although I’m confident I could write the code to do it, or read and understand and use a C library that does it (GIFLIB, for instance), it seems like a really arduous and dull task.