W3C HTML Validator
The following is instructions to get the W3C HTML Validator on Microsoft Windows NT (XP Pro 2002). I've never seen any instructions elsewhere for running the Validator on Windows, nor Windows listed on W3C's list of supported platforms, and figured this might be useful.
This is about the 4th attempt over the years to get W3C HTML Validator running on Microsoft Windows. Previous attempts were on Windows 98SE. I haven't tried this on Windows 98SE yet.
The following is based on the following software versions:
Apache (win32 native build, NOT cygwin)
10:34pm [ord@chimera ~] > apache -v
Server version: Apache/1.3.29 (Win32)
Server built: Oct 29 2003 08:39:07
Cygwin
10:34pm [ord@chimera ~] > uname -a
CYGWIN_NT-5.1 chimera 1.5.5(0.94/3/2) 2003-09-20 16:31 i686 unknown unknown Cygwin
Perl
10:34pm [ord@chimera ~] > perl -v
This is perl, v5.8.0 built for cygwin-multi-64int
Note: On my first recent attempt of getting the W3 HTML Validator working on Windows NT, I was successful, however it would not recognise any charset except utf-8. I don't know how I fixed this. I simply tried again from scratch, and had success.
Note: I typically run Apache as a service under the SYSTEM account. I thus far can't get the SGML parse to run properly under this environment. This is with Apache running from the command line (from my user account).
Perl requirements:
CGI.pm - I had this already installed, either from the install of Perl,
or otherwise previously installed it. No specific comments
CGI::Carp - Ditto
File::Spec - Ditto
HTTP::Parser - Ditto - manually installed previously.
LWP::UserAgent - Ditto - manually installed previously.
Set::IntSpan Used Set-IntSpan-1.07 - Built OTTB.
Text::Iconv Used Text-Iconv-1.2 - This is where I've gotten stuck in
the past.
http://www.cygwin.com/ml/cygwin/2002-08/msg01384.html
solves the problem. - Patch the Perl makefile to link against
the iconv library. ( 'LIBS' => ['-liconv'], # e.g., '-lm')
URI::Escape Previously installed.
Basically the only two Perl modules I specifically built for the validator were
Set::IntSpan and Text::Iconv. Any others (per chance I've missed any) were
previously built, or came with Perl.
SGML Requirements
Used OpenSP 1.5 - compiles OOTB on cygwin without a hitch
- ./configure, ./make, ./make install
One note is that onsmgl will be installed to /usr/local/bin. For some really
strange reason, onsgml can't find cygwin1.dll when run from Perl via the
web server, despite c:\cygwin\bin being in the %PATH%. (Runs fine from there
from a shell). Either install onsgml to /usr/bin (or /bin, since /usr/bin is
generally mounted as /bin on cygwin), or copy cygwin1.dll to /usr/local/bin
(which is a bad idea TM).
Apache Requirements:
I run the validator from a named virtual host, therefore no main Apache config
changes were required.
Note that some these directives may be redundant. For example, I don't
have cgi-script handlers set in the main server configuration. Similarly,
note the rewrite directives, thus the rewrite module needs to be loaded
in the main server configuration, if not already. Further comments are
listed in the virtual host code.
The virtual host looks like this:
<VirtualHost *> # Note, /vhosts/validator.home/ is an example path. # Fill in any instance of it with the correct path (which on Apache win32 # won't start with a forward slash, it starts with a drive specification, # or network path. ServerName www.validator.home ServerAdmin webmaster@www.validator.home DocumentRoot "/vhosts/validator.home/htdocs" ErrorLog logs/www.validator.home-error.log CustomLog logs/www.validator.home-access.log common AddHandler cgi-script .pl # Given that I'm using the ScriptInterpreterSource registry instead of # shebang, I can't find a way to make check execute. A quick fix is # to renamme it to check.pl and rewrite requests to check to check.pl # # Also, to save messing around with validator.home/htdocs/ (document root) # and validator.home/httpd/cgi-bin/ (location of check script in distribution) # (because I can't get it to work, with the lack off Apache win32 understanding # symlinks), I've just copied httpd/cgi-bin/check to htdocs/check.pl. This # is the only file whose location needs to change. RewriteEngine on RewriteRule ^/check /check.pl # The following comes from httpd/conf/httpd.conf in # the distribution. # # ExecCGI at least is required. <Directory /vhosts/validator.home/htdocs> Options ExecCGI IncludesNOEXEC Indexes MultiViews # AllowOverride None AddHandler server-parsed .html AddCharset utf-8 .html </Directory> </VirtualHost>
Validator Setup
Extract the source distribution of the validator to the directory specified in the virtual host configuration.
Copy httpd/cgi-bin/check to htdocs/check.pl.
The next step is to patch check.pl, it doesn't work OOTB on cygwin.
For a quick rundown of the changes required:
The -T switch in the shebang needs to be removed. Since things work by
simply removing it, I haven't looked into why. (Yet).
Pragmas need to be disabled. Haven't looked much into it, commenting them out
works, that was good enough for me.
The location of the main configuration file is changed. This is a personal
preference. Since I'm not using the cygwin build of Apache, I prefer to
keep all of the validator files outside of the cygwin directory tree).
The change loads the configuration file from htdocs/config.
The executible check for the SGML parser being executible needs to be turned
off. (Off course, when on an NTFS disk, ensure that it is excutible by the
account that runs the SGML parser). Not sure exactly why this fails, but it
does.
The configuration files must have UNIX line endings (based on binary mounts)
for Perl to parse them correctly.
Certainly check.cfg (or /etc/w3c/validator.conf if you install it as such),
otherwise you'll experience problems with the SGML library path, which will
contain a <CR>, which will mess up when filenames are appended to it.
I've converted all configiration files to UNIX line endings, whether
or not any others are required.
The largest problem is that open3() doesn't seem to work on cygwin. At
least I can't get it to work. The solution is to use system()
,
to run the SGML parser
writing stdout and stderr via the shell command, and reading them back in
after the SGML parser ends.
Currently I just use /tmp/sgml.stdin, /tmp/sgml.stdout and /tmp/sgml.stderr,
though creating proper temporary filenames for each stream would be better.
--- check 2002-12-01 10:18:00.000000000 +1100 +++ check.pl 2004-01-04 19:47:20.265625000 +1100 @@ -1,4 +1,4 @@ -#!/usr/bin/perl -T +#!/usr/bin/perl # # W3C MarkUp Validation Service # A CGI script to retrieve and validate a MarkUp file @@ -25,8 +25,8 @@ use 5.006; # # Pragmas. -use strict; -use warnings; +#use strict; +#use warnings; # # Modules. @@ -84,16 +84,19 @@ use constant O_DOCTYPE => 4; # 0000 010 # Define global variables. use vars qw($DEBUG $CFG $VERSION); - # # Things inside BEGIN don't happen on every request in persistent # environments, such as mod_perl. So let's do globals, eg. read config here. BEGIN { + # ord: + # Override the location of the configuration file. + $ENV{W3C_VALIDATOR_CFG} = "/vhosts/validator.home/htdocs/config/check.cfg"; + # # Read Config Files. $CFG = &read_cfg($ENV{W3C_VALIDATOR_CFG} || '/etc/w3c/validator.conf'); - if (! -x $CFG->{'SGML Parser'}) { + if ( 0 && ! -x $CFG->{'SGML Parser'}) { die("Configured SGML Parser '$CFG->{'SGML Parser'}' not executable!"); } @@ -533,9 +536,13 @@ if ($DEBUG) { # # Temporary filehandles. -my $spin = IO::File->new_tmpfile; -my $spout = IO::File->new_tmpfile; -my $sperr = IO::File->new_tmpfile; +#my $spin = IO::File->new_tmpfile; # ord: We create a different one. +#my $spout = IO::File->new_tmpfile; +#my $sperr = IO::File->new_tmpfile; + +# ord: +# Write to our own temp file. +my $spin = new IO::File " > /tmp/smgl.stdin"; # # Dump file to a temp file for parsing. @@ -549,18 +556,44 @@ seek $spin, 0, 0; # # Run it through SP, redirecting output to temporary files. -my $pid = do { - no warnings 'once'; - local(*SPIN, *SPOUT, *SPERR) = ($spin, $spout, $sperr); - open3("<&SPIN", ">&SPOUT", ">&SPERR", @cmd); -}; +# ord: +# open3() doesn't work on cygwin +#my $pid = do { +# no warnings 'once'; +# local(*SPIN, *SPOUT, *SPERR) = ($spin, $spout, $sperr); +# open3("<&SPIN", ">&SPOUT", ">&SPERR", @cmd); +#}; + +# ord: +# Create the command line, and run with system(); +#my $cmd; +for(my $i = 0; $i < scalar(@cmd); $i++){ + $cmd .= @cmd[$i] . " "; +} + +$cmd = 'cat /tmp/smgl.stdin | ' . $cmd; +$cmd .= ' > /tmp/smgl.stdout 2> /tmp/smgl.stderr'; +#$cmd .= '-b /tmp/smgl.stdout -f /tmp/smgl.stderr'; +# open(HANDLE, '|' . $cmd); +system($cmd); + +# ord: +# Open the temp files we created, and set them to be deleted +# when closed. +my $spout = new IO::File "< /tmp/smgl.stdout"; +my $sperr = new IO::File "< /tmp/smgl.stderr"; +# Delete all the files when closed. +#unlink("/tmp/smgl.stdout"); +#unlink("/tmp/smgl.stderr"); +#unlink("/tmp/smgl.stdin"); # # Close input file, reap the kid, and rewind temporary filehandles. undef $spin; -waitpid $pid, 0; -seek $_, 0, 0 for $spout, $sperr; +#my $pid = 0; +#waitpid $pid, 0; #ord: don't wait. +seek $_,0, 0 for $spout,$sperr; $File = &parse_errors($File, $sperr); # Parse error output. undef $sperr; # Get rid of no longer needed filehandle.
Copy the SGML library to htdocs/sgml-lib.
Edit the validator configuration file to put all the file locations to where they should. It should be pretty self explainator on how to do so.
Optional Extras
That should be all that's required to get the W3C Validator work on on Microsoft Windows XP.
When using the service locally, it's worthwhile changing some links to use the local copies, rather than the remote copies.
A lot of the changes can simply be made in the config/check.cfg file.
The Msg FAQ URI
option helps for checking errors when not
connected to the internet.
It's also worthwhile modifying htdocs/footer.html to reference
http://<local_host>/images/vxhtm10.png rather than
http://www.w3.org/Icons/valid-xhtml10.
Switch Styles
About Style Switching.