This Project never made it far beyond the specification point ;-(.

Contents

General

The Image Server is a CGI-script that provides pictures in various dimensions.

Features

initial version

  • request with arbitrary bounding box
  • original picture ratio is maintained
  • delivering and caching of pictures in dimensions of a certain raster
  • picture fits into the requested bounding box and configured raster
  • picture is just scaled down, never up
  • serving scaled images per HTTP redirect to the location of an static image file

future versions

  • support extern storage of scaled images to allow mirroring. (redirection map)
  • support explicit precaching (to be copied to a mirror)

Design Decisions

Raster

We introduced a raster in which the images are scaled. This prevents the cache of get filled up with various fine grain scaled versions of the same image. The raster width is a configuration option which belongs to the caching group.

Specification

Behaviour

Request

A picture is requested with a picture path and a bounding box. The bounding box specifies the maximal size of the image. The Image Server has a configured raster in which it provides pictures. Lets assume the raster is set to 50 pixel. The picture size is scaled down to fit in the bounding box and then the width is rounded downwards to multiple of 50. (The height is obviously scaled down as well) The needed version of the picture is looked up in the cache and is generated if needed. The file name contains the dimensions and the request is answered using redirection to that file. That allows browser side caching even for slightly different bounding boxes.

Raster

The raster value must be fixed over a reasonable time. The client shouldn't get different scaled images for the same requested dimension. In particular, the server is not allowed to serve a “good matching” version of an images which is a cache hit.

Cache

The cache is an important part of the Image Server since it can reduce the server side load and the latency time. The cache has a space limit and the caching strategy is LRU (least recently used). The cached files should have the same modification date as the original file. Why? A file is deleted from the cache because of space limits. If it is requested again and the browser still has it in his cache it can be taken from there instead of downloaded again. Furthermore there should be a minimum lifetime of a cached file since the client needs to download them.

Mirrors

A mirror should keep pictures in various sizes to save bandwidth or to gain faster access. It is also handy to be able to store the pictures on cheap web-space without CGI support. The Image Server has a mirror map where every mirror has a list of files it has. Furthermore there could be some information about the bandwidth of the mirror. If a file is mirrored on more than one mirror is redirected to the one with more bandwidth. Load balancing is not reasonable since we loose the client-side caching.

Interface

CGI

The request URL is:

\http://<host>/<path>/imgserv.cgi?src=<image src>&width=<width>&height=<height>

Where <image src> is an absolute path to the image form the document root. Relative paths are not possible since they are normally resolved by the browser. Therefor the Image Server cannot determine the right path. <width> and <height> are integers greater than zero and denote the dimension of the requested image in pixel. Other units are not planed.

Examples:

\http://<host>/<path>/imgserv.cgi?src=images/1.jpg&width=800&height=600
\http://<host>/<path>/imgserv.cgi?src=gallery/holiday/1.jpg&width=1048&height=879

Configuration

TODO: describe the configuration file syntax and options

Implementation

Language

Haskell is a great language for CGI scripts since it is pure functional and has an easy to use CGI library. It is not that great with IO but you can code quite nice with it.

Cache

The OS serves us with a filesystem. Let's use it! The complete filesystem structure of the source image files is generated in the cache folder as well. For every file exists an info file with some information about the file, like dimension, file size, and all scaled versions with it's last use and sizes.

  1. an image is requested the info file is read. If it doesn't exist it is generated (inclusive directories if necessary). The containing file size is compared with the file size of the source image and if they differ all scaled version are deleted and the info file is rebuilt. Otherwise the requested dimension is looked up and if it exists it is delivered and the access time is written. If it doesn't exist it is generated (see next section).

After each creation the cache size is checked. In case it exceeds the given limit files are deleted which are least recently used. (access time) This is done until just 2/3 of the size limit is used. This makes sure the maintainance run isn't necessary too often.

Image scaling and dimension retrieval

The ImageMagick tool identify is used to get the dimensions of an image and the convert tool is used to scale them down.

identify

identify -format "%hx%w" source.image
> 640x480

convert

convert -sample height x width source.image resized.image

produces an image that fits into the given bounding box. I.e. neither height nor width of the image is larger than the specified ones. convert respects the ratio of the image.

Raster

Since convert respects the ratio we only have to rasterise the bounding box's width.

rasteredWidth = (boundingBoxWidth div raster) * raster; // div = integer division

Installation

  • Either compile a version for the OS on the web server machine or install hugs
  • check how to use cgi scripts with your web server
  • Haskell cgi script are used like perl CGI scripts with the difference that runhugs is started instead of perl or in case of a binary it is just executed

archive/imageserver.txt · Last modified: 13.11.2008 22:20 (external edit)