| The source to the generation page before processing |
| 001 |
<p> |
| 002 |
By trade, I'm a programmer and as such I'm lazy; I prefer the computer to |
| 003 |
do the boring work for me if at all possible. This shows in how I maintain |
| 004 |
this web site - most of the eye-candy (<i>ie</i> parts of the web page which |
| 005 |
aren't a part of the content itself, but add value to the web page (I don't |
| 006 |
believe I just wrote that...)) is automatically generated for me. This then |
| 007 |
leaves me free to do things like work on the actual content of the web site. |
| 008 |
</p> |
| 009 |
|
| 010 |
<p> |
| 011 |
After all, the web is supposed to be about content isn't it. Not that you'd |
| 012 |
know this from looking at certain pages on the web. |
| 013 |
</p> |
| 014 |
|
| 015 |
<p> |
| 016 |
Anyway, I write basic pages <i>sans</i> tags like |
| 017 |
<code><html>...</html></code>, |
| 018 |
<code><body>...</body></code>, |
| 019 |
<code><title>...</title></code> |
| 020 |
and the eye-candy like the navigation bar using <tt>vi</tt>. I use a text |
| 021 |
editor as most of the tools out there which allow you to just point and drool |
| 022 |
web pages <i>a la</i> DTP packages produce horrendous HTML. That and most of |
| 023 |
the stuff I'm doing is about textual content... which has little to do with |
| 024 |
a GUI DTP-a-like tool. |
| 025 |
</p> |
| 026 |
|
| 027 |
<p> |
| 028 |
These pages are taken by a number of scripts and are munged in various ways |
| 029 |
to produce the web pages you see before you now. The pages start off in a |
| 030 |
source area on my home system, go via a processed area and finally end up |
| 031 |
on the web server which actually serves out the content. |
| 032 |
</p> |
| 033 |
|
| 034 |
<p><table border="0"> |
| 035 |
<tr><td valign="top" align="left"><powered-by-solaris-x86></td> |
| 036 |
<td valign="top" align="left"> |
| 037 |
<p> |
| 038 |
All of the work done to generate these web pages is performed on an |
| 039 |
aging PC running Solaris x86. Whilst it is not open source (<i>not yet at least, |
| 040 |
there are rumors that Sun are thinking of opening Solaris to the world</i>), I |
| 041 |
personally believe Solaris to be better operating system on the Intel platform |
| 042 |
than the free Unix varients out there. |
| 043 |
</p> |
| 044 |
<p> |
| 045 |
It does take a minimum amount of hardware to do the job however - certainly |
| 046 |
OSs like Linux do very well on a limited amount of hardware. That said I |
| 047 |
have run Solaris on a 486-33SX with 12Mb of memory without too many problems. |
| 048 |
</p> |
| 049 |
</td></tr> |
| 050 |
</table></p> |
| 051 |
<p> |
| 052 |
Solaris does <strong>extremely</strong> well on multi-processor machines, |
| 053 |
either Sparc or Intel (the <a href="http://soldc.sun.com/support/drivers/hcl/hcl.html">Solaris |
| 054 |
x86 HCL</a> lists support for machines with 8 CPUs and on Sparc hardware they've not yet |
| 055 |
run into any scaling problems within the kernel, even using 128 processors). IMHO |
| 056 |
Solaris has one of the best multi-threaded kernel in the industry. The two-level |
| 057 |
thread scheduling system <strong>is</strong> a little strange a first though. |
| 058 |
</p> |
| 059 |
<p> |
| 060 |
Now, if Sun would only improve the device support.... |
| 061 |
</p> |
| 062 |
|
| 063 |
<p> |
| 064 |
Where ever possible, the web pages produced are static in nature; <i>ie</i> |
| 065 |
I have not used server side includes, PHP3, CGI scripts or mod_perl to automate |
| 066 |
the addition of the eye-candy. The reason for this is simple - static content |
| 067 |
is generally more friendly to the web server serving up the content (web servers |
| 068 |
can easily serve up static files). It also means that the pages are decidedly |
| 069 |
more cacheable by browsers, web caches, <i>etc</i>. |
| 070 |
</p> |
| 071 |
|
| 072 |
<p> |
| 073 |
I'm not against dynamic content mind you - I've created sites which are almost 100% |
| 074 |
dynamically generated. However those sites were dynamically generated for good |
| 075 |
reasons - often the content on the web pages came from databases (a good example |
| 076 |
of this is the displaying of bandwidth utilisation graphs to leased line customers |
| 077 |
at the ISP I work for). However eye-candy like the navigation bar <i>et al</i> can |
| 078 |
be done without resorting to dynamic methods. |
| 079 |
</p> |
| 080 |
|
| 081 |
<p> |
| 082 |
I also try to ensure that these web pages are viewable in as many browsers as I |
| 083 |
have access to. In my case this means <tt>lynx</tt>, Netscape and |
| 084 |
<a href="/microsoft/microsoft/052698borgman_600x388.html">IE (<i>spit</i>)</a>. |
| 085 |
</p> |
| 086 |
|
| 087 |
<blockquote> |
| 088 |
<menu icon="arrow" size="medium" color="purple"> |
| 089 |
|
| 090 |
<menu-item name="New HTML" link="new-html/"> |
| 091 |
Some of the new HTML I have defined within meta-HTML to make my |
| 092 |
life easier |
| 093 |
</menu-item> |
| 094 |
|
| 095 |
<menu-item name="This page before processing" link="about-before-processing.html"> |
| 096 |
What the HTML I actually write looks like |
| 097 |
</menu-item> |
| 098 |
|
| 099 |
<menu-item name="This page after processing" link="about-after-processing.html"> |
| 100 |
What the HTML you are seeing after the HTML I write has been finished |
| 101 |
with after the processing |
| 102 |
</menu-item> |
| 103 |
|
| 104 |
</menu> |
| 105 |
</blockquote> |
| 106 |
|
| 107 |
<p><hr></p> |
| 108 |
<p><div align="center"><table width="100%" border="0"> |
| 109 |
<tr> |
| 110 |
<td align="left"><vi-anim></td> |
| 111 |
<td align="center"><wml-powered></td> |
| 112 |
<td align="right"><any-browser-common-sense></td> |
| 113 |
</tr> |
| 114 |
</table></div></p> |
| 115 |
<p><hr></p> |
| 116 |
|
| 117 |
<p> |
| 118 |
Anyway, the various scripts & programs which do this are: |
| 119 |
</p> |
| 120 |
|
| 121 |
<menu icon="ball" size="medium" color="purple"> |
| 122 |
|
| 123 |
<menu-item name="<tt>BuildSites</tt>"> |
| 124 |
<p> |
| 125 |
This <tt>perl</tt> script kicks off the all of the work. It is |
| 126 |
essentially a wrapper around a set of commands which do things like |
| 127 |
produce galleries of images (see the <a href="/simes/photos/landscapes/">Photos |
| 128 |
of landscapes and sunsets</a> for an example of this); |
| 129 |
process the raw HTML I write and produce the version with added eye-candy; |
| 130 |
grab log files from the real web servers; and publish the processed web site. |
| 131 |
</p> |
| 132 |
</menu-item> |
| 133 |
|
| 134 |
<menu-item name="<tt>sitecopy</tt>"> |
| 135 |
<p> |
| 136 |
This is a program which designed to maintain the content of a remote web site |
| 137 |
from a local copy. Only the changes which have occured on the local system are |
| 138 |
actually transmitted, making it ideal for a dialup line. |
| 139 |
</p> |
| 140 |
<p> |
| 141 |
See <a href="http://www.lyra.org/sitecopy/">SiteCopy home page</a> and the |
| 142 |
<a href="http://freshmeat.net/appindex/1998/11/26/912108385.html">FreshMeat |
| 143 |
app entry</a> for more details. |
| 144 |
</p> |
| 145 |
</menu-item> |
| 146 |
|
| 147 |
<menu-item name="<tt>wml</tt>"> |
| 148 |
<p> |
| 149 |
This is a program designed for off-line HTML generation for Unix. It |
| 150 |
provides such things as pre-processing, meta-HTML, embedded <tt>perl</tt>, |
| 151 |
<tt>m4</tt> and much more. I abused it a lot to produce these web pages. |
| 152 |
</p> |
| 153 |
<p> |
| 154 |
See the <a href="http://www.engelschall.com/sw/wml/">WML home page</a> for |
| 155 |
more details. |
| 156 |
</p> |
| 157 |
</menu-item> |
| 158 |
|
| 159 |
<menu-item name="<tt>SourceSites</tt>"> |
| 160 |
<p> |
| 161 |
This <tt>perl</tt> script does the real work of taking the HTML I write |
| 162 |
and munging it into the form you see before you. |
| 163 |
</p> |
| 164 |
<p> |
| 165 |
<tt>SourceSites</tt> is built around making my life easier to maintain |
| 166 |
the web site. <tt>SourceSites</tt> goes through various stages to do |
| 167 |
this. The firstly is goes through and finds all of the description files |
| 168 |
I've scattered around the system. These files contain information on |
| 169 |
every file and is used to hold information like the author of the |
| 170 |
document, document title, document header, section the document is in |
| 171 |
and so on. Files can be recognised (or ignored) based upon the full |
| 172 |
pathname, partial file name or regular expression. It is also posssible |
| 173 |
to run commands here so that description files and index files can be |
| 174 |
automatically generated. An example of this is the <a href="/essay-rants/">Essays, |
| 175 |
Rants and Raves</a> section which has an automatically generated index page. The |
| 176 |
description file and the index file are generated by a script which looks for |
| 177 |
tags within each of the files within the area. In this way new files can easily |
| 178 |
be added into the system - it just has to be copied into place and have the |
| 179 |
correct tags put into place. |
| 180 |
</p> |
| 181 |
<p> |
| 182 |
After the finding of the description file, the script scans the source area |
| 183 |
and processed area building up a list of the entries in both. Once done, the |
| 184 |
two lists are scanned and at that point it knows what the delete from the |
| 185 |
processed area and what needs to be processed. |
| 186 |
</p> |
| 187 |
<p> |
| 188 |
How each entry to be processed is handled depends on what that entry is. Directories |
| 189 |
are created in the processed area. Files not matched by a given rule within |
| 190 |
<tt>SourceSites</tt> are merely copied. HTML files are attacked by <tt>wml</tt>. |
| 191 |
However this is <tt>wml</tt> given a large number of defines and some pre-processing |
| 192 |
to begin with. Thus this file was processed using the following command line: |
| 193 |
</p> |
| 194 |
<p><img src="/icons/top_divider.gif" alt="+---------------" width="120" height="10"></p> |
| 195 |
<blockquote><pre><font size="-1"> |
| 196 |
% /usr/local/bin/perl -e 'print "#include <head.html>\n"; |
| 197 |
> while(<>) { |
| 198 |
> print "$_"; |
| 199 |
> } |
| 200 |
> print "#include <foot.html>\n";' |
| 201 |
<strong><Source></strong>/about/generation.html |
| 202 |
| /usr/bin/sed 's/\*\//\&#42;\&#47;/g' |
| 203 |
| /usr/bin/sed 's/\/\*/\&#47;\&#42;/g' |
| 204 |
| /usr/bin/sed 's/\\n/\\@NEWLINE\\@/g' |
| 205 |
| /usr/local/bin/wml |
| 206 |
-v |
| 207 |
--norcfile |
| 208 |
--settime |
| 209 |
--pass="1239" |
| 210 |
--epilog=weblint |
| 211 |
-DROOTFILE=about/generation.html |
| 212 |
-DWEB_SITE=www.bpfh.net |
| 213 |
-DSOURCE_DIR=<strong><Source></strong> |
| 214 |
-DFINAL_PLACE=<strong><Processed></strong>/about/generation.html |
| 215 |
-DREMOTE_DIR=<strong><Processed></strong> |
| 216 |
-I<strong><Source></strong>/about |
| 217 |
-I<strong><Includes></strong> |
| 218 |
-I<strong><Includes></strong>/www.bpfh.net |
| 219 |
-I<strong><Source></strong> |
| 220 |
-DLOCAL_CTIME='932049965' |
| 221 |
-DLOCAL_HEAD='How this site is generated' |
| 222 |
-DLOCAL_TITLE='About BPFH - Site generation' |
| 223 |
-DLOCAL_AUTHOR='simes' |
| 224 |
-DLOCAL_MTIME='932049965' |
| 225 |
-DLOCAL_SECTION='About BPFH' |
| 226 |
-DPROG_DEF_LIST="LOCAL_CTIME:LOCAL_HEAD:LOCAL_TITLE |
| 227 |
:LOCAL_AUTHOR:LOCAL_MTIME:LOCAL_SECTION" |
| 228 |
-o <strong><Processed></strong>/about/generation.html ; |
| 229 |
/usr/local/bin/perl -e ' |
| 230 |
> my $prn=0; |
| 231 |
> my $in=$ARGV[0]; |
| 232 |
> my $out="$ARGV[0].tmp"; |
| 233 |
> open(FILE,$in) || die "Failed to open $in for reading - $!\n"; |
| 234 |
> open(OUT,">$out") || die "Failed to open $out for writing - $!\n"; |
| 235 |
> while(<FILE>) { |
| 236 |
> ((!$prn)&&(!/^$/o))&&($prn=1); |
| 237 |
> ($prn)&&print OUT $_; |
| 238 |
> } |
| 239 |
> close(FILE); |
| 240 |
> close(OUT); |
| 241 |
> if (!rename($out,$in)) { |
| 242 |
> unlink($out); |
| 243 |
> die "Failed to rename(\"$out\",\"$in\") - $!\n"; |
| 244 |
> } |
| 245 |
> exit 0;' <strong><Processed></strong>/about/generation.html |
| 246 |
** WML:Verbose: Processing time (seconds): |
| 247 |
** WML:Verbose: main | ipp mhc epl gm4 div asub hfix hstr slic | TOTAL |
| 248 |
** WML:Verbose: ---- | ---- ----- ----- ---- ---- ---- ---- ---- ---- | ------ |
| 249 |
** WML:Verbose: 6.01 | 1.86 1.50 3.25 -- -- -- -- -- 2.12 | 14.74 |
| 250 |
% |
| 251 |
</font></pre></blockquote> |
| 252 |
<p><img src="/icons/bottom_divider.gif" alt="+---------------" width="120" height="10"></p> |
| 253 |
<p> |
| 254 |
It should be noted that the final output is put through <tt>weblint</tt> to make |
| 255 |
sure that the HTML actually conforms to some sort of standard. My normal aim is |
| 256 |
to produce HTML which has no warns when <tt>weblint</tt> is run over it. |
| 257 |
</p> |
| 258 |
<p> |
| 259 |
Yes, it <strong>does</strong> take a long time to produce the final HTML file |
| 260 |
from the source I write. The above time (14.74 seconds) is a little longer |
| 261 |
than normal thanks to the length of this document. However it is is nothing |
| 262 |
when compared to the time it takes to produce the <a href="/sitemap/">maps |
| 263 |
of the site</a> from the hash of files in the source area. |
| 264 |
</p> |
| 265 |
<p> |
| 266 |
However, we are producing static HTML here which is just seen as a plain |
| 267 |
file by the web server. Thus the above time does not factor into the time |
| 268 |
taken to serve the document out in any way shape or form. |
| 269 |
</p> |
| 270 |
</menu-item> |
| 271 |
|
| 272 |
<menu-item name="<tt>pic-index</tt>"> |
| 273 |
<p> |
| 274 |
This <tt>perl</tt> script does the work of taking the images I give it |
| 275 |
and generating the thumbnails and surrounding HTML. I've done quite a |
| 276 |
bit of work on it over the past couple of years. The result of this is |
| 277 |
that I can put an image into its correct directory, put a single line |
| 278 |
into an index file within that directory and then wait. The thumbnail, |
| 279 |
HTML for the main image and the index page are all automatically |
| 280 |
generated for me. |
| 281 |
</p> |
| 282 |
<p> |
| 283 |
I'm quite happy with the thumbnail generation - even though its automatically |
| 284 |
done for me it does quite well in shrinking the image. Looking over all of the |
| 285 |
images within <a href="/simes/photos/">photo collection</a>, the thumbnail |
| 286 |
images range from 14% down to 1.5% of the size of the full-sized image. Taking |
| 287 |
the largest and smallest percentages, along with the largest thumbnail, we see: |
| 288 |
</p> |
| 289 |
<div align="center"><p><table border="2" width="80%"> |
| 290 |
<tr><th rowspan="2">Thumb nail<br><font size="-2">Click for full image</font></th> |
| 291 |
<th colspan="2">Size</th> |
| 292 |
<th rowspan="2">Percentage</th></tr> |
| 293 |
<tr><th>Thumbnail</th><th>Full image</th></tr> |
| 294 |
<tr><th><a href="/simes/photos/strange/fire3.html"><img src="/simes/photos/strange/mini/mini-fire3.jpg" alt="Smallest thumbnail" border="0" width="36" height="32"></a></th> |
| 295 |
<td align="center">411 bytes</td> |
| 296 |
<td align="center">2891 bytes</td> |
| 297 |
<th>14.2%</th></tr> |
| 298 |
<tr><th><a href="/simes/photos/landscapes/ivy-mike.html"><img src="/simes/photos/landscapes/mini/mini-ivy-mike.jpg" alt="Largest thumbnail by percentage" border="0" width="77" height="96"></a></th> |
| 299 |
<td align="center">858 bytes</td> |
| 300 |
<td align="center">56940 bytes</td> |
| 301 |
<th>1.5%</th></tr> |
| 302 |
<tr><th><a href="/simes/photos/people/simes/simon1.html"><img src="/simes/photos/people/simes/mini/mini-simon1.jpg" alt="Largest thumbnail by size" border="0" width="119" height="88"></a></th> |
| 303 |
<td align="center">1570 bytes</td> |
| 304 |
<td align="center">24415 bytes</td> |
| 305 |
<th>6.4%</th></tr> |
| 306 |
</table></p></div> |
| 307 |
<p> |
| 308 |
It should be noted that the size of the thumbnail often depends on the complexity of |
| 309 |
the full size image. |
| 310 |
</p> |
| 311 |
<p> |
| 312 |
In case you're wondering, the thumbnails are generated using the <tt>netpbm</tt> tools |
| 313 |
and (in the case of JPEGs) the JPEG commands which come with the JPEG library. |
| 314 |
</p> |
| 315 |
<div align="center"><p><table border="2" width="80%"> |
| 316 |
<tr><th>Image type</th><th>Command line</th></tr> |
| 317 |
<tr><th>GIF</th> |
| 318 |
<td><pre> |
| 319 |
giftopnm <strong>Image</strong> |
| 320 |
| pnmscale <strong>Scale</strong> |
| 321 |
| ppmnorm -bpercent <strong>ColorPercent</strong> -wpercent <strong>ColorPercent</strong> |
| 322 |
| ppmquant 256 |
| 323 |
| ppmtogif |
| 324 |
> <strong>Thumbnail</strong> |
| 325 |
</pre></td></tr> |
| 326 |
<tr><th>JPEG</th> |
| 327 |
<td><pre> |
| 328 |
djpeg -pnm <strong>Image</strong> |
| 329 |
| pnmscale <strong>Scale</strong> |
| 330 |
| cjpeg -quality <strong>Quality</strong> -optimize |
| 331 |
> <strong>Thumbnail</strong> |
| 332 |
</pre></td></tr> |
| 333 |
</table></p></div> |
| 334 |
<p> |
| 335 |
Where: |
| 336 |
</p> |
| 337 |
<div align="center"><p><table border="2" width="80%"> |
| 338 |
<tr><th>Image</th> |
| 339 |
<td>The full sized image</td></tr> |
| 340 |
<tr><th>Thumbnail</th> |
| 341 |
<td>The resulting thumbnail</td></tr> |
| 342 |
<tr><th>Scale</th> |
| 343 |
<td>What scale the thumbnail is to the full size image. I normally use <tt>0.3</tt>, |
| 344 |
<i>ie</i> the thumbnail is 3/10ths of the size of the full size image</td></tr> |
| 345 |
<tr><th>ColorPercent</th> |
| 346 |
<td>How much to normalise the colors in the thumbnail. This reduces the number of |
| 347 |
colors in the image and had a <strong>big</strong> effect on the size of GIF |
| 348 |
thumbnails. I normally use a value of <tt>15</tt> for this - <i>ie</i> the darkest |
| 349 |
15% of pixels are mapped to black and the lightest 15% of pixels are mapped to |
| 350 |
white.</td></tr> |
| 351 |
<tr><th>Quality</th> |
| 352 |
<td>The scale of the quantization tables within the JPEG. I use a value of <tt>25</tt> |
| 353 |
which is quite a low quality. This value was choosen to produce a recognisable |
| 354 |
thumbnail without producing too large a JPEG.</td></tr> |
| 355 |
</table></p></div> |
| 356 |
</menu-item> |
| 357 |
|
| 358 |
</menu> |