Digital image structure
Sections in this page
Pixels
A digital image is made up of a rectangular array (or matrix) of equal-sized picture elements. Such elements are usually referred to as "pixels". If you repeatedly enlarge a digital photo (without smoothing, as photo programs often do) you will see the pixels as squares of constant colour, as in this gross enlargement of a gull's eye:
↓
Notice how the edge of the yellow rim of the eye appears quite smooth in the original photo despite the fact that at the pixel level it is jagged. This is because the edge is not a sudden step from yellow to another colour. Adjacent pixels are intermediate colours. The image of the yellow edge fell partially across some pixels of the electronic detector chip in the camera, so they detected an intermediate level rather than the full yellow or black. Edges in images are typically like this, not sudden steps, and so it is not as easy to measure where they are as you might have thought.
Colour
The colour of a pixel is represented numerically as the relative amounts of 3 primary colours: red, green and blue. Ultimately this is because our eyes work in a similar way. Cells in the centre of the retina of a normal human eye have three pigments which are sensitive to red, green and blue so we detect colour by the relative amounts of these 3 components.
There is a discussion on one of our other web sites of how colours are built up from the primary colours: link.
Here are some examples of colours, showing the amount of red, green and blue in each. The brightness scale for each value is from 0 (black) to 255 (full saturated colour).
Saturated primary and secondary colours:
red
yellow
green
cyan
blue
magenta
Less saturated versions of the same colours:
From black to white in shades of grey (equal amounts of red, green and blue):
Pixel samples from the gull's eye above:
The layout of colour detectors in a typical camera is like this:
Notice that there are twice as many green detectors as for red or blue. Green is the middle of the daylight spectrum and our eyes are most sensitive to it, so it is the most important colour for the camera.
The detected values have to be processed in order to make an array of pixels such that every pixel has 3 values. Some cameras have a non-rectangular array of detectors, so their processing is more complex. The processing can either happen in the camera, to make a JPEG file, or afterwards (on the PC) if the camera produces a RAW file.
This is what the RAW file for the gull's eye above looks like, as shown by GRIP:
Here we have to come clean and admit that the version which was shown before had already been scaled down, to exagerrate the appearance of the pixels. This RAW version does show the pixel arrangement exactly as detected in the camera.
GRIP is able to load RAW images by using the jrawio plug-in for javax.imageio. This loads them as shown but does not do the necessary interpolation between pixels and contrast stretching to make a normal image. When GRIP loads a RAW image it enables you to see the original data and offers the "Interpret RAW" option on the levels menu of the image frame. This option processes the image in a similar way to the camera (as a step towards making a JPEG version) or other applications such as Adobe Camera Raw.
Channels
The result of processing the raw data to interpolate the missing values (either in the camera or computer) is a rectangular array of pixels, each of which has three values. The image can be envisaged as comprising 3 layers, one for each of the primary colours red, green and blue. The layers are often called "channels" and sometimes "bands". So our photos are made up of 3 channels. The channels can be split and viewed as separate (uncoloured) images. Photoshop is able to do that and so is our Java application, GRIP.
A monochrome, or grey-scale, image is one in which all 3 channels would be identical. Each pixel only has a brightness, or grey, level. It is sufficient to store such an image as a single channel, requiring less memory.
Bits per channel
The colour examples above used a scale from 0 to 255 for each channel. That would be the case for 8-bit images, such as would be stored in JPEG files. That is because 28 = 256, so 8 bits (= 1 byte) is sufficient for storing 256 distinct values. The maximum number of possible colours in such an image is about 16 million (2563).
As discussed on the image file formats page, more can be done with RAW images which contain 4096 possible values for each channel. 4096 = 212, so that is 12 bits per channel. Consumer digital cameras usually detect this number of bits per channel when a photograph is taken. 12 bits is 1 and a half bytes but files are saved with data values in whole numbers of bytes. So to accomodate such images, storage is allocated for 2 bytes, or 16 bits, per channel per pixel. This means that some storage is wasted but it gives us room to do things when we process images.
(As of mid-2007 some top-of-the-range digital SLR cameras, eg Canon's EOS 1Ds Mk 3, now capture 14 bits per channel.)
TIFF files (see the image file formats page) can store images with 8 or 16 bits per channel and any number of channels (though we usually need 3 for photos).
Image sizes
It is important to realise that regardless of how image data may be compressed to store them on disc, an image loaded in memory for processing occupies a large number of bytes. If the image has N pixels (where N is image width x image height, in pixels), it occupies 3N bytes if it has 8 bits per channel and 6N bytes if it has 16 bits per channel.
So a 12-megapixel image loaded from a JPEG file (which can only be 8 bits per channel) requires 36 megabytes of memory for processing. If it is loaded from a RAW file or from a 16-bit TIFF file it requires twice that: 72 megabytes.
In today's typical PC with 1 or 2 gigabytes of RAM this is not a problem. A dozen such images can be held quite comfortably for processing in a gigabyte. However, it is necessary to design software carefully to avoid having unnecessary copies of images in memory at once.