Tuesday, November 20, 2007

Using Tidy to keep your web pages valid

Tidy

Use the proper tool for the job.

As a web developer in my second year, I've started advocating web standards at my workplace. It's an enormous undertaking, with over 3 GB of material on an Internet and Intranet site. Some pages were done with Dreamweaver (our currently supported tool), others in Adobe GoLive (our previously supported tool), and the rest in whatever tools someone had available-- Claris Home Page, Netscape Composer, or even text editors and FTP clients.

I've taken advantage of most of my clients being on leave during the Thanksgiving week and started consolidating entire subfolders of web pages over to a Dreamweaver template. But I can't just copy and paste the content into a template and forget it-- I wanted to make sure the page was at least valid Transitional XHTML. The designers can keep their nested tables (for now) if they like, but the pages must run successfully through a validator for me to be happy with them.

So, this is what my work flow has been like:

1) Create a new file and apply the appropriate Dreamweaver template to it.

2) Select the "content" region of the original page, and then copy and paste it into the "content" region of the newly created file.

3) Select all of the HTML from the new page (i.e. the template shell and the content), then copy and paste it into the W3C's validator. (It's an Intranet page, so I can't just point the validator at the URL of the page).

4) Go back and forth in an iterative process between Dreamweaver and the validator's error messages-- fixing what I can understand, and getting closer to that green "This page is valid Transitional XHTML" banner.

This process is time and labor intensive. That may work for a small web site, but when you've got 15,000 pages, you need a fast way to validate and fix broken pages. Otherwise you'll never get the majority of your web pages into a valid state and enjoy the benefits of a standardized web site.

I've been underwhelmed with Dreamweaver's built-in validator. It was time to get serious and take a second look at Tidy. I'd looked at Tidy a long time ago, but never got far with it because the man page seemed impenetrable.

Tidy is a small utility program created by one of the big brains at the W3C. It doesn't just tell you what's wrong with your HTML code-- it tries to fix it. If most of the errors you make in your xHTML code are things like: forgetting to close your unpaired tags (e.g. img, br, hr, etc.) or using deprecated attributes or improperly nested/overlapping tags, then Tidy can clean it up for you.

Is it perfect? No, it doesn't recognize Dreamweaver's non-editable template code, and has made changes in part of the page that are supposed to be off limits. Fortunately, since my templates are already valid HTML, that's not as much of a concern as you'd think. On the plus side, it's sped up Step #4 in my workflow considerably. I run Tidy against the local copy of a file, then FTP it up to the web server. Since Tidy can use standard input and outputs, I bet there's probably some way to script Tidy so it can do batch processing of files.