Contents
General
The Image Server is a CGI-script that provides pictures in various dimensions.
Features
initial version
- request with arbitrary bounding box
- original picture ratio is maintained
- delivering and caching of pictures in dimensions of a certain raster
- picture fits into the requested bounding box and configured raster
- picture is just scaled down, never up
- serving scaled images per HTTP redirect to the location of an static image file
future versions
- support extern storage of scaled images to allow mirroring. (redirection map)
- support explicit precaching (to be copied to a mirror)
Design Decisions
Raster
We introduced a raster in which the images are scaled. This prevents the cache of get filled up with various fine grain scaled versions of the same image. The raster width is a configuration option which belongs to the caching group.
Specification
Behaviour
Request
A picture is requested with a picture path and a bounding box. The bounding box specifies the maximal size of the image. The Image Server has a configured raster in which it provides pictures. Lets assume the raster is set to 50 pixel. The picture size is scaled down to fit in the bounding box and then the width is rounded downwards to multiple of 50. (The height is obviously scaled down as well) The needed version of the picture is looked up in the cache and is generated if needed. The file name contains the dimensions and the request is answered using redirection to that file. That allows browser side caching even for slightly different bounding boxes.
Raster
The raster value must be fixed over a reasonable time. The client shouldn't get different scaled images for the same requested dimension. In particular, the server is not allowed to serve a "good matching" version of an images which is a cache hit.
Cache
The cache is an important part of the Image Server since it can reduce the server side load and the latency time. The cache has a space limit and the caching strategy is LRU (least recently used). The cached files should have the same modification date as the original file. Why? A file is deleted from the cache because of space limits. If it is requested again and the browser still has it in his cache it can be taken from there instead of downloaded again. Furthermore there should be a minimum lifetime of a cached file since the client needs to download them.
Mirrors
A mirror should keep pictures in various sizes to save bandwidth or to gain faster access. It is also handy to be able to store the pictures on cheap web-space without CGI support. The Image Server has a mirror map where every mirror has a list of files it has. Furthermore there could be some information about the bandwidth of the mirror. If a file is mirrored on more than one mirror is redirected to the one with more bandwidth. Load balancing is not reasonable since we loose the client-side caching.
Interface
CGI
The request URL is:
http://<host>/<path>/imgserv.cgi?src=<image src>&width=<width>&height=<height>
Where
<image src> is an absolute path to the image form the document root.
Relative paths are not possible since they are normally resolved by the browser. Therefor the Image Server cannot determine the right path.
<width> and
<height> are integers greater than zero and denote the dimension of the requested image in pixel. Other units are not planed.
Examples:
http://<host>/<path>/imgserv.cgi?src=images/1.jpg&width=800&height=600
http://<host>/<path>/imgserv.cgi?src=gallery/holiday/1.jpg&width=1048&height=879
Configuration
TODO: describe the configuration file syntax and options
Implementation
Language
Haskell is a great language for CGI scripts since it is pure functional and has an easy to use CGI library. It is not that great with IO but you can code quite nice with it.
Cache
The OS serves us with a filesystem. Let's use it!
The complete filesystem structure of the source image files is generated in the cache folder as well. For every file exists an info file with some information about the file, like dimension, file size, and all scaled versions with it's last use and sizes.
If an image is requested the info file is read. If it doesn't exist it is generated (inclusive directories if necessary). The containing file size is compared with the file size of the source image and if they differ all scaled version are deleted and the info file is rebuilt. Otherwise the requested dimension is looked up and if it exists it is delivered and the access time is written. If it doesn't exist it is generated (see next section).
After each creation the cache size is checked. In case it exceeds the given limit files are deleted which are least recently used. (access time) This is done until just 2/3 of the size limit is used. This makes sure the maintainance run isn't necessary too often.
Image scaling and dimension retrieval
The
ImageMagick tool
identify is used to get the dimensions of an image and the
convert tool is used to scale them down.
identify
identify -format "%hx%w" source.image
> 640x480
convert
convert -sample height x width source.image resized.image
produces an image that fits into the given bounding box. I.e. neither height nor width of the image is larger than the specified ones. convert respects the ratio of the image.
Raster
Since convert respects the ratio we only have to rasterise the bounding box's width.
rasteredWidth = (boundingBoxWidth div raster) * raster; // div = integer division
Installation
- Either compile a version for the OS on the web server machine or install hugs
- check how to use cgi scripts with your web server
- Haskell cgi script are used like perl CGI scripts with the difference that runhugs is started instead of perl or in case of a binary it is just executed