A Practical File Compression Methodology
Maximising File Compression of Black-and-White Line Drawings
by Luca Rossi and Paul Cambie,  rev.2, 02/04

Introduction

Amateur website operators typically have an especially strong incentive to absolutely maximise file compression. Amateurs typically avail themselves of the "free" webspace provided by their ISP's. Such space is limited. Often VERY limited. Meanwhile file compression methods available are varied, and each does some jobs apparently better than others. So, which to use? To absolutely maximise file compression it is beneficial to apply each of several methods available, depending on the file content.

The most aggressive file compression is achieved by dividing a compound image and text page or article into parts . . .

  • text
  • black-and-white drawings, such as a schematic
  • colour and black-and-white photo's
  • grey-scaled diagrams, such as a component overlay

. . . and applying different methods to each.

Luca Rossi
In early 2001, Luca Rossi explained to me his CCITT-4 compression methodology for schematic diagrams. With limited English, and a cloth-eared novice (ME!), Luca's patience was severely tested!

If you're working with a composite article, rather than a single-format single-page, the resultant collection of resultant compressed files may be bundled together in the same .zip file perhaps, to restore some sense of completeness, or linked as separate elements in a unifying web-page.

An Example

By way of example, the electronics project magazine article on which the ETI466 section of this website is based, was processed in this way. You can see the three files referred to elsewhere on this website. The 10 page magazine article included text, several line drawings, a grey-scaled component overlay, and a black and white photograph. The first scan of the article, a couple of years ago when the article was first prepared for this website, was crude - the whole thing as .gif's came to 1650kb. More recently, by cropping and being selective about the images, this has been reduced to 950kb, still as .gif's, all pages processed much the same way. Apply the above selective methods, the results are now as follows. As you can see the total file size is down to just 170kb, (with image quality the same or better than previously);

File Name Content File Size (bytes) Format
ETI466Part1.zip Text 21.7kb Zipped .pdf
ETI466Part2.zip Line Drawings 88.1kb Zipped .pdf
ETI466Part3.zip Greyscale Images 60.4kb Zipped .jpg
Total: 170.2kb  

In my own case, I had 10Mb "free" from my ISP for my website. This type of processing, though tedious, and not so easy for the reader to "reassemble" either, meant a 10-fold decrease in the usage of this limited amount of server space. Indeed the inappropriately compressed original amounted to 16.5% of that 10Mb space obviously, and the optimised-compressed version just 1.7%. A hell of a difference, huh!

Most of the content of my website is screeds and screeds of schematic diagrams. These lend themselves particularly, to a form of compression originally designed for fax transmissions. Fax is about the only other application I've ever managed to think of that also combines (a) the desirability of minimising bandwidth severely, and (b) having the vast majority of content reducible to just 2-bit (i.e. only black and only white, no grey) images - usually folks' hand-written missives! (Other modes of compression are used for fax messages with greyscale images - such is not considered here, but does exist of course!).

Application of the CCITT-4 Algorithm

So, this discussion focuses on a method for maximising compression of electronic schematic diagrams for website applications, by utilisation of the CCITT-4 compression algorithm. It produces fierce compression - the best compression you'll achieve - for 1-bit (i.e. black and white) line drawings.  Electronic schematic diagram source material is typically just that; a black and white ("B&W") line drawing. Often, in practice, these are of less than ideal contrast and resolution.  So the following procedure relies also on your dexterity in cleaning up such diagrams, and improving their contrast ratio by skilful use of your scanner and associated software.  The scan will be saved as a 1-bit (black and white) (compressed) file, so needs to be first manipulated to yield as a high a contrast between the black lines of the diagram figures, and the white background.

Method 1: The Hard Way

The procedure described below to use the CCITT-4 algorithm, is tedious, long-winded and tiresome. Yet it works, and to this day many of the files on my website were processed this way. However I have subsequently found a quicker way of producing just as good result,  much much faster. This alternate method was available to me all along - I just never knew about it until more recently. Unless you're a masochist, therefore, you'll want to skip the next bit, and jump to the next heading; Method 2: The Simpler Way.

Acrobat Distiller

"Method 1", to utilise the CCITT-4 compression algorithm, used Acrobat Distiller 3.0, as detailed in the Application Procedure below. The procedure may seem long and twisted, but it is only a matter of practice. With experience, PDF's can be prepared in just a few minutes.

Using Adobe Acrobat Distiller 3.0, when you compress a B&W image with a little dithering present, the CCITT Class 4 or AWD compression algorithms produce greater compression than seen in GIF's. CCITT and AWD (AWD isn't offered by the Acrobat 3.0 software package), were developed in the past for fax transmission. When fax machines race at 300, 600 or 1200 baud, the compression ratio of B&W faxed images is vital. The other side of this coin is that files coded with the CCITT-4 standard are more fragile than GIF images. This is the why CCITT-4 PDF's sent via the web must be Zipped to be surely error-free.

Adobe PDF ("Portable Document Format")

CCITT-3 compresses B&W images better than GIF, and there is no problem putting the image on-line directly as a file.pdf. CCITT-4 compresses B&W images much better, but the resulting files need to be Zipped when posted on-line (Posting the file.pdf directly can frequently cause download problems. Posting files as file.pdf CCIT4 compressed, zipped and uploaded as file.zip avoids and overcomes these problems).

CCITT-4, like GIF, is a NON-destructive compression algorithm (compared to JPG which uses a "destructive" compression algorithm: JPG compresses images with a data loss. CCITT-3, CCITT-4 and GIF do not produce data losses). Note that GIF files can be 1 to 8 bit (256 colour or greyscale) images. However files compressed with the CCITT-4 algorithm can be ONLY 1 bit (i.e. B&W) images!

Application Procedure

Now either (A);

or (B):

Finally,WinZip

Method 2: The Simpler Way

CCITT-4 compression isn't supported by the usual painting-photo manipulation software. But it IS offered by Kodak "Imaging for Windows" software that comes with the Win98 2nd ed. It can be a bit tricky to find it, but it's there, and anything that the Kodak Imaging software can open (it itself seems pretty limited in the image file formats it supports!), it can convert to CCITT-4 compressed .TIF files. Subsequently Zipping these adds an additional 1~2% decrease in file size that seems hardly worth it, and doesn't appear necessary to preserve file integrity during on-line transmission, as far as I'm aware.

I gather that the Kodak "Imaging for Windows" software doesn't come with, and doesn't immediately work, with Windows XP, but also that some sort of patch is somehow available to get around that - I'm not at all familiar with this however!

I found that I had to first use the software I run my scanner with, to save resultant image files as TIF files, THEN use Kodak Imaging for Windows to convert them (tho' they still remain "TIF" files, albeit with altered compression). Photo-Shop for instance, would save as TIF all right, but only as one of several variants comparable to CCITT-3 (not -4).

Within the Kodak software select Page --> Properties. You should see the "Black and White" option already checked (the process ain't going to work if you don't already have a 2-bit, black-and-white image). Select "Compression", and force the change to "CCITT Group 4 (2d) Fax", --> OK --> File Save.

If you're using the Kodak Imaging for Windows software for the actual scanning process itself too, then you can select this option during the save process. (Perhaps I missed something, but the Kodak software didn't help much when it came to cropping and rotating raw scans, something I like to do first. So I tend not to use it for scanning.)

That's About It!

I wouldn't sail off into any of the above methodologies at all, unless you've good reason to! Otherwise it's a lot of work to little point - there remain few applications for such rigorous consideration of how to save file-size, these days, other than amateur websites!

Please, please email me if you've tried any of this - I'm always keen to both learn and advise!

Home