Issues for digital artifact photo archiving [Part 2]

In Part 1 of this series, I discussed the issue of "resolution" of digital photographs of historical artifacts. In this second part, I continue with discussions of color, of image formats, and of image sources...

Color: There has never really been a market for quality black-and-white digital cameras. As a result, digital images are likely to be in color. Recall that the way digital cameras get color is a bit odd: there's a square array of sensors inside the camera, and some or (usually) all of those sensors are covered with colored filters that let in only one color of light. The usual colors are red, green, and blue. Thus, in a local area, you might have sensors that look like

  RGB
  GBR
  BRG

Thus, a color image automatically has issues of "resolution": there's really only 1/3 of the pixels in an area at a given color, and it's hard to guess how their pattern will interact with the image.

Image Formats: Once the sensor inside a digital camera has recorded an image, the image must be exported from the camera to a computer or elsewhere to be viewed or printed. The details of how this is done are complex; the important thing to understand is that an image file notionally contains a "map" of intensities for each sensor. Recall that a modern digital camera might have 12,000,000 sensors or more, and that each sensor probably records at least one "character" or byte of information, and you're looking at a 12 megabyte raw image file. This is a lot for one picture, even given modern storage and communication capacities. However, the raw image represents "best evidence" of what the sensors saw.

Typically, software inside or outside the camera immediately post-processes the image in several ways. The software averages sensors around a given pixel to try to get a better estimate of what the light and color levels were there. It also applies corrections for defects of various kinds in the sensors and optics of the camera. Finally, and perhaps most importantly, it compresses the image to save storage space and communication bandwidth.

What is compression? Perhaps it is best illustrated by a simple example. Imagine that

you're trying to describe the nonsense string of characters

  aaaaaaaaaaaaaaabaaaaaaaaaaaabaa

It seems like a more sensible description would be

  15ab12ab2a

if only because it is much shorter: 1/3 the size of the original. Note that no information is lost: you could reconstruct the original string perfectly from the compressed description. Perhaps, though, you don't need to be able to perfectly reconstruct the original. Writing

  31a

is almost right, and is 1/3 smaller than even the first compressed version. We call the first kind of compression lossless, and the second kind lossy, for obvious reasons. Both kinds are used for digital images, depending on the purpose to which they will be put.

There are many common lossless compressed image formats; the most popular, far and away, are currently gif, png, and tiff. While the file formats have slightly different properties, the differences aren't that interesting. A preprocessed and losslessly compressed image from a 4 megapixel camera is probably going to be somewhere between 200KB and 1MB in size.

There is really only one lossy compressed image format in current use: jpeg (AKA jpg). This format trades substantial information loss for substantial space savings, but in a way that makes the loss of image information not very human-visible. For image analysis, zooming, and the like, though, jpeg is not an ideal image format.

There are many places that a digital image could come from. In this document, I have assumed that it will come from a camera. Digital camera quality varies widely in several ways. Perhaps the most important one not yet discussed here is the quality of the optics. A series of lenses are used to place light on the camera's sensors; if these lenses are mis-designed, mis-aligned, or dirty, then the images produced will be awful. A good "prosumer" camera for digital image work might cost $800-$2000 currently: higher-cost commercial cameras have even better optics, sensors, software, etc.

The other obvious source of digital images, after digital cameras, is the digitization of photographic prints or negatives. A film or flatbed scanner is the usual tool here. Like a digital camera, it produces digital images in a standard format. The quality of these images is often very high, depending mainly on the quality of the original being digitized. Once the original has been digitized, the issues are the same as with any other digital image.

In the next part of this series I will develop a sample worksheet for digital images.