[Index] [About] About BPFH
Author Simes
Created:  2001-01-02
Last changed:  2001-01-02
[Root]  [Prev] [Idx] [Next] http://www.bpfh.net/about/generation.html 

How this site is generated

By trade, I'm a programmer and as such I'm lazy; I prefer the computer to do the boring work for me if at all possible. This shows in how I maintain this web site - most of the eye-candy (ie parts of the web page which aren't a part of the content itself, but add value to the web page (I don't believe I just wrote that...)) is automatically generated for me. This then leaves me free to do things like work on the actual content of the web site.

After all, the web is supposed to be about content isn't it. Not that you'd know this from looking at certain pages on the web.

Anyway, I write basic pages sans tags like <html>...</html>, <body>...</body>, <title>...</title> and the eye-candy like the navigation bar using vi. I use a text editor as most of the tools out there which allow you to just point and drool web pages a la DTP packages produce horrendous HTML. That and most of the stuff I'm doing is about textual content... which has little to do with a GUI DTP-a-like tool.

These pages are taken by a number of scripts and are munged in various ways to produce the web pages you see before you now. The pages start off in a source area on my home system, go via a processed area and finally end up on the web server which actually serves out the content.

Powered by Solaris x86

All of the work done to generate these web pages is performed on an aging PC running Solaris x86. Whilst it is not open source (not yet at least, there are rumors that Sun are thinking of opening Solaris to the world), I personally believe Solaris to be better operating system on the Intel platform than the free Unix varients out there.

It does take a minimum amount of hardware to do the job however - certainly OSs like Linux do very well on a limited amount of hardware. That said I have run Solaris on a 486-33SX with 12Mb of memory without too many problems.

Solaris does extremely well on multi-processor machines, either Sparc or Intel (the Solaris x86 HCL lists support for machines with 8 CPUs and on Sparc hardware they've not yet run into any scaling problems within the kernel, even using 128 processors). IMHO Solaris has one of the best multi-threaded kernel in the industry. The two-level thread scheduling system is a little strange a first though.

Now, if Sun would only improve the device support....

Where ever possible, the web pages produced are static in nature; ie I have not used server side includes, PHP3, CGI scripts or mod_perl to automate the addition of the eye-candy. The reason for this is simple - static content is generally more friendly to the web server serving up the content (web servers can easily serve up static files). It also means that the pages are decidedly more cacheable by browsers, web caches, etc.

I'm not against dynamic content mind you - I've created sites which are almost 100% dynamically generated. However those sites were dynamically generated for good reasons - often the content on the web pages came from databases (a good example of this is the displaying of bandwidth utilisation graphs to leased line customers at the ISP I work for). However eye-candy like the navigation bar et al can be done without resorting to dynamic methods.

I also try to ensure that these web pages are viewable in as many browsers as I have access to. In my case this means lynx, Netscape and IE (spit).

} New HTML
Some of the new HTML I have defined within meta-HTML to make my life easier
} This page before processing
What the HTML I actually write looks like
} This page after processing
What the HTML you are seeing after the HTML I write has been finished with after the processing


Written with vi Processed with WML Best viewed with any browser


Anyway, the various scripts & programs which do this are:

# BuildSites

This perl script kicks off the all of the work. It is essentially a wrapper around a set of commands which do things like produce galleries of images (see the Photos of landscapes and sunsets for an example of this); process the raw HTML I write and produce the version with added eye-candy; grab log files from the real web servers; and publish the processed web site.

# sitecopy

This is a program which designed to maintain the content of a remote web site from a local copy. Only the changes which have occured on the local system are actually transmitted, making it ideal for a dialup line.

See SiteCopy home page and the FreshMeat app entry for more details.

# wml

This is a program designed for off-line HTML generation for Unix. It provides such things as pre-processing, meta-HTML, embedded perl, m4 and much more. I abused it a lot to produce these web pages.

See the WML home page for more details.

# SourceSites

This perl script does the real work of taking the HTML I write and munging it into the form you see before you.

SourceSites is built around making my life easier to maintain the web site. SourceSites goes through various stages to do this. The firstly is goes through and finds all of the description files I've scattered around the system. These files contain information on every file and is used to hold information like the author of the document, document title, document header, section the document is in and so on. Files can be recognised (or ignored) based upon the full pathname, partial file name or regular expression. It is also posssible to run commands here so that description files and index files can be automatically generated. An example of this is the Essays, Rants and Raves section which has an automatically generated index page. The description file and the index file are generated by a script which looks for tags within each of the files within the area. In this way new files can easily be added into the system - it just has to be copied into place and have the correct tags put into place.

After the finding of the description file, the script scans the source area and processed area building up a list of the entries in both. Once done, the two lists are scanned and at that point it knows what the delete from the processed area and what needs to be processed.

How each entry to be processed is handled depends on what that entry is. Directories are created in the processed area. Files not matched by a given rule within SourceSites are merely copied. HTML files are attacked by wml. However this is wml given a large number of defines and some pre-processing to begin with. Thus this file was processed using the following command line:

+---------------


% /usr/local/bin/perl -e 'print "#include <head.html>\n";
> while(<>) {
>   print "$_";
> }
> print "#include <foot.html>\n";'
   <Source>/about/generation.html
 | /usr/bin/sed 's/\*\//\&#42;\&#47;/g'
 | /usr/bin/sed 's/\/\*/\&#47;\&#42;/g'
 | /usr/bin/sed 's/\\n/\\@NEWLINE\\@/g'
 | /usr/local/bin/wml
     -v
     --norcfile
     --settime
     --pass="1239"
     --epilog=weblint
     -DROOTFILE=about/generation.html
     -DWEB_SITE=www.bpfh.net
     -DSOURCE_DIR=<Source>
     -DFINAL_PLACE=<Processed>/about/generation.html
     -DREMOTE_DIR=<Processed>
     -I<Source>/about
     -I<Includes>
     -I<Includes>/www.bpfh.net
     -I<Source>
     -DLOCAL_CTIME='932049965'
     -DLOCAL_HEAD='How this site is generated'
     -DLOCAL_TITLE='About BPFH - Site generation'
     -DLOCAL_AUTHOR='simes'
     -DLOCAL_MTIME='932049965'
     -DLOCAL_SECTION='About BPFH'
     -DPROG_DEF_LIST="LOCAL_CTIME:LOCAL_HEAD:LOCAL_TITLE
                      :LOCAL_AUTHOR:LOCAL_MTIME:LOCAL_SECTION"
     -o <Processed>/about/generation.html  ;
 /usr/local/bin/perl -e '
> my $prn=0;
> my $in=$ARGV[0];
> my $out="$ARGV[0].tmp";
> open(FILE,$in) || die "Failed to open $in for reading - $!\n";
> open(OUT,">$out") || die "Failed to open $out for writing - $!\n";
> while(<FILE>) {
>   ((!$prn)&&(!/^$/o))&&($prn=1);
>   ($prn)&&print OUT $_;
> }
> close(FILE);
> close(OUT);
> if (!rename($out,$in)) {
>   unlink($out);
>   die "Failed to rename(\"$out\",\"$in\") - $!\n";
> }
> exit 0;' <Processed>/about/generation.html
** WML:Verbose: Processing time (seconds):
** WML:Verbose: main |  ipp   mhc   epl  gm4  div asub hfix hstr slic |  TOTAL
** WML:Verbose: ---- | ---- ----- ----- ---- ---- ---- ---- ---- ---- | ------
** WML:Verbose: 6.01 | 1.86  1.50  3.25   --   --   --   --   -- 2.12 |  14.74
%

+---------------

It should be noted that the final output is put through weblint to make sure that the HTML actually conforms to some sort of standard. My normal aim is to produce HTML which has no warns when weblint is run over it.

Yes, it does take a long time to produce the final HTML file from the source I write. The above time (14.74 seconds) is a little longer than normal thanks to the length of this document. However it is is nothing when compared to the time it takes to produce the maps of the site from the hash of files in the source area.

However, we are producing static HTML here which is just seen as a plain file by the web server. Thus the above time does not factor into the time taken to serve the document out in any way shape or form.

# pic-index

This perl script does the work of taking the images I give it and generating the thumbnails and surrounding HTML. I've done quite a bit of work on it over the past couple of years. The result of this is that I can put an image into its correct directory, put a single line into an index file within that directory and then wait. The thumbnail, HTML for the main image and the index page are all automatically generated for me.

I'm quite happy with the thumbnail generation - even though its automatically done for me it does quite well in shrinking the image. Looking over all of the images within photo collection, the thumbnail images range from 14% down to 1.5% of the size of the full-sized image. Taking the largest and smallest percentages, along with the largest thumbnail, we see:

Thumb nail
Click for full image
Size Percentage
ThumbnailFull image
Smallest thumbnail 411 bytes 2891 bytes 14.2%
Largest thumbnail by percentage 858 bytes 56940 bytes 1.5%
Largest thumbnail by size 1570 bytes 24415 bytes 6.4%

It should be noted that the size of the thumbnail often depends on the complexity of the full size image.

In case you're wondering, the thumbnails are generated using the netpbm tools and (in the case of JPEGs) the JPEG commands which come with the JPEG library.

Image typeCommand line
GIF
giftopnm Image
  | pnmscale Scale
  | ppmnorm -bpercent ColorPercent -wpercent ColorPercent
  | ppmquant 256
  | ppmtogif 
  > Thumbnail
JPEG
djpeg -pnm Image
  | pnmscale Scale
  | cjpeg -quality Quality -optimize
  > Thumbnail

Where:

Image The full sized image
Thumbnail The resulting thumbnail
Scale What scale the thumbnail is to the full size image. I normally use 0.3, ie the thumbnail is 3/10ths of the size of the full size image
ColorPercent How much to normalise the colors in the thumbnail. This reduces the number of colors in the image and had a big effect on the size of GIF thumbnails. I normally use a value of 15 for this - ie the darkest 15% of pixels are mapped to black and the lightest 15% of pixels are mapped to white.
Quality The scale of the quantization tables within the JPEG. I use a value of 25 which is quite a low quality. This value was choosen to produce a recognisable thumbnail without producing too large a JPEG.

[Root]  [Prev] [Idx] [Next] http://www.bpfh.net/about/generation.html 
 [Index] [About] Powered by WML
Author Simes
Created:  2001-01-02
Last changed:  2001-01-02
 
More by the same author Comments? EMail webmaster@bpfh.net
 
© Simes