A blog post from danboe.net

A standard MSN homepage, part IV: onwards to XHTML 1.0 Strict

Posted Sep 16, 2004 at 11:58 AM

In the last set of posts, I discussed how the MSN home page could be taken from its current implementation of invalid markup and style to standard XHTML 1.0 Transitional and valid CSS. I explained the changes along the way, and discussed the benefits of getting to validity and the Transitional schema. I illustrated first hand the problems that are caused by authoring styles against quirks rendering mode, and documented the changes to fix this. Although the files we ended the last post with are better than where we started, they’re still far from ideal. They’re significantly heavier and more complex than is necessary. They tightly couple the appearance with the structure, and this tight coupling greatly limits the page’s reach, maintainability and flexibility.

This post begins the journey to valid XHTML 1.0 Strict, and sets out to reintroduce the homepage with valid markup that is intentionally chosen to be semantically meaningful and cleanly isolated from its desired appearance. I’ll demonstrate the significant savings of this approach, and illustrate that taking this path and achieving this simplicity and efficiency does not significantly limit the design possibilities.

Make no mistake: this will be a refactoring of what we had before. Although the end result may render similarly to the site today, its implementation will be very different. They have said that a picture is worth a thousand words. In writing this post, I hope to show that the picture of the msn.com homepage can be rendered today in significantly less than that.

Step 1: Unlearning what we have learned

The easiest way to begin a Strict implementation from an invalid or Transitional page like we ended the last post with is to simply open the original page and strip away everything that’s not content. I would encourage every developer who works with HTML to go though this experience at least once, for the lessons learned in this experience are well worth it. So starting with the valid Transitional page I ended the last post with, here’s a blow-by-blow of the steps I took to get to my first valid XHTML 1.0 Strict homepage:

  • Update the doctype from Transitional to Strict.
  • Remove the link to the stylesheet, since the markup will change so significantly that the stylesheet will need to be completely redone.
  • Remove all script tags for similar reasons, being sure to also remove the script event attributes that may remain, including any javascript calls that may reside in unusual places like href attribute values. When possible, replace these script-only implementations with a no-script URL, thereby ensuring that scripting is used as an enhancement to the page, not a requirement.
  • Delete every single table, thead, tbody, tfoot, tr, th and td tag from the markup, since almost every one of them is used for laying out the page, rather than presenting tabular data. Of course, we’ll remove these tags, but keep their contents. Tip: an easy way to find and replace tags is to use regular expressions. For example, to find all opening table tags, including any attributes, we can use this regular expression: \<table[^\>]+\>, while this one finds all closing table tags: \<\/table\>.
  • Delete all b and i tags, or replace them with strong and em respectively, if and only if they are meant for emphasizing their contents in a way that makes sense for assistive technologies like screen readers. If their use is purely visual, we will accomplish it in CSS.
  • Remove all br tags, since they are presentational in nature and do not add semantic meaning to the document.
  • Replace all   entity references (and its twin ) with a single space, since these non-breaking spaces are used for layout purposes. In fact, search the document for all entity references, and look at what they’re used for. If they’re used for presentation instead of content, delete them. For example, I found a bunch of · entity references, used as bullets, which is silly since unordered lists can be used for this.
  • Do the same thing for characters hanging out in the document. For example, I also found pipe characters (”|”), commas and numbers that were used only for delimiting or numbering lists that were not implemented as lists in the first place.
  • Delete every class and id attribute from the markup, since most of these no longer make sense now that the tables surrounding them no longer exist.
  • Remove all style attributes, since we’ll be defining our styles in external CSS files.
  • Remove all border attributes from img tags, since they are not supported in the Strict schema.
  • Remove all target attributes from a tags, since they are not supported in the Strict schema.
  • Remove all name attributes from form tags, since they are not supported in the Strict schema.
  • Insert a div tag at the root of the body tag and all form tags (if not already present) to correct problems with incorrect nesting of block and inline elements. These tags require a block element at their root.
  • Remove all transparent images that only exist for purposes of layout, such as “spacer” images.
  • Remove all images that are not content, but rather decorative in their use.

Result

Performing these steps results in our first valid XHTML 1.0 Strict document which, though valid, looks vastly different than the homepage does today (at least to a visual browser), and is certainly not very well structured. In fact, it’s not structured at all at this point. That’s the point—the intent at this stage is to remove all of the structure surrounding the content, because this structure was created with the intent to get the visual presentation right at all costs. By changing our approach from getting the presentation right from the start to getting the structure right from the start, I hope to show we can meet the same result with less work and weight. Our XHTML content-only document is about 1/2 of the size (compressed) from where we started. In other words, for every byte of content we sent before, we sent a byte of presentation or structure along with it. That is expensive.

Step 2: structure the content

From our content-only page we created in step 1, adding structure to the page is very straight forward. The first thing I did was change the stock market summary to a table markup structure, because the content in this case is tabular in nature. With that one change, we’re done adding tables – it’s the only one we need. We just avoided all of the pain and performance associated with nested tables.

Adding headings

The next step in adding structure forced me to approach the document as if it was content that I wanted to create as an outline. Those elementary school English courses finally paid off. Instead of using Roman numerals, however, I used heading tags. After a short while, I came up with the following page hierarchy:

  • h1 is used for the Welcome message, essentially the page title.
  • h2 is used for section or category headings, such as What’s New, Entertainment, Shopping and Also on MSN. While I was at it, I created headings for the site navigation, header and footer sections of the document. It made sense to do so for semantic reasons, and makes the page appear better organized when viewed without styles (which, of course, are not yet written). I recognized that CSS gives us the ability to hide some of these headings for CSS-enabled browsers, allowing me to achieve our desired visual design (which does not show all of these headings) without sacrificing the document’s structure or accessibility.
  • h3 is used for the headings associated with individual modules of content, such as Today on MSN or MSNBC News.
  • h4 is used for the headings of site navigation. I chose this because the sections of content (headed with h3) seemed higher in importance than the sections of navigation (headed with h4).
  • h5 is used for attribution or explanatory text associated with the content, such as the delay explanation for stock data. Hmmm. Perhaps this is the wrong approach to use in the h5 case, since these are not really headings, but really more like sub-text. Perhaps something we should revisit and use something like sub for such content instead. It might be more semantically meaningful, which is really the point at this stage.

After wrapping the section titles in heading tags (and creating new ones where none existed), I walked through all of the headings and noticed that some contained links. I decided to remove these links in almost all cases, because it seemed to compete with the purpose of the heading. I think there are better approaches to directing the user to the front doors associated with these categories, and the implementation where sometimes a heading is a link and sometimes it isn’t only confuses the page (especially when the visual treatment fails to distinguish headings that are links from headings that are not until the user mouses over them). I kept links that were task-oriented for that area of the document, such as the refresh link in the Money heading, which I’ll make sure get a visual treatment that clearly distinguishes them from the heading itself.

Finally, I stepped back from my headings implementation and considered how the headings would work as a way to navigate through the document, since some assistive technologies provide this capability. The implementation appeared to be logical and functional in this context, so I was ready to move on. Note: the previous version of the page only used h4 headings, so where we’re at right now is already better, from a semantic perspective.

Adding generic containers

In XHTML, the div and span tags are defined as generic containers that render their content by default without any special style consideration, other than treating div tags as block elements and span tags as inline elements. Armed with just this simple knowledge, I set out to further structure the document, using these generic containers to simply group markup (headings) and content together, indicating that I intend to treat the group as a single entity that I will target with specific styles or behavior. At a high level, I created div containers (and assigned them appropriate id values) for the following areas of the page, to essentially divide the page into these subdivisions:

  • header
  • site navigation
  • content
  • footer

So with the page now so divided, I looked at each of these divisions, and further subdivided them in a way that made sense. I started by finding the heading tags I created, and proceeded to create containers around each heading and the content it titled. For example, I wrapped an h3 and ul together in a div; that I identified with a specific id that maps to MSNBC News. Similarly, I wrapped an h3 and form together in another div that I identified with a specific id that maps to the Stock Lookup.

I used div to achieve these divisions and subdivisions, since this is precisely what the div tag is for: div is just short for division. I continued subdividing the page in this same manner and mindset, looking for content or groups of content that I think of as having a single identity but that were not yet semantically structured as such. For example, I subdivided the content div into content1 and content2, since we tend to think of them as the primary content areas of the page, the middle and right column.

Treating lists as lists

As I worked my way through the markup, any time I encountered collections of links (or other similar content) that were not structured in XHTML lists, I refactored them as lists (mostly using ul, with one case for ol and a few cases for dl when it seemed to make sense). This greatly improved the semantic meaning for these sections of content (compared to their previous implementations), and brought a lot more consistency to the document’s structure. Additionally, this refactoring provides for a light and consistent way of binding styles to these collections, without really limiting the possibilities of that appearance.

Identifying the remaining pieces

When I added the generic containers, I specified id values for the containers when I knew that I wanted to be able to target the container for special style treatment or script behavior later. Completing this for the containers, I then proceeded to scan the rest of the document, and assigned id values to tags that were not containers (such as the sign-in link or logo image) but still needed to be targeted for special treatment. In other words, I used the id attribute as a targeting device, essentially asking myself whether or not I will need to aim specific styles or script functionality at this element. When the answer was yes, I assigned a value to the id.

A few words about my id value choices

I chose generic id values such as “content1” and “content2” over more currently-recognizable values such as “middlecolumn” and “rightcolumn” because I want them to make sense regardless of the visual layout or presentational treatment that may be applied today and gone tomorrow. In other words, I want to take steps early on to ensure that my markup can survive redesigns of the document without needing revision. Using semantic markup that is modular (well-divided) and uses generic id values helps to accomplish this.

Result

What we get at this point is semantically structured content in valid XHTML 1.0 Strict doctype. It doesn’t look much different than what we had in step one, since we haven’t done much visually, except that:

  • Our headings are now clear, giving our document better organization; and
  • The div and heading tags introduce some line breaks (without having to rely on the br tag).

Just a little more content

You may have noticed in the above referenced page that I snuck in some new content and moved a few links around in the process as well. The new content is targeted specifically at those users who receive the page without styling, and displays a message about why the page looks like it does. Of course, we can hide this for current browsers, using a style applied against the div’s id tag. The second piece of additional content I added is what is often called a “skip nav” section, which allows users who see the document in this unstyled fashion a way to jump directly to specific sections of the document. This is accomplished by linking directly to the id of the element we want to navigate to. Of course, this also works across pages as well: for example, you can get to the content section of that document from this document you’re reading as well.

Step 3: style the structure

This step is pretty brief, because the only modification we need to make to the markup is to add a link tag that points to our CSS file. You should note that this stylesheet actually just imports a CSS file that contains the actual styles. This approach is taken for reasons outlined in my post on applying style to markup.

Result

Here’s the first peek at our styled document, with style hooks using only the structure itself and id attributes. About the styles themselves: a discussion of specifically how I built the stylesheet is outside the scope of this document. I can walk through how I did that, if there’s interest. If you’d like me to write that up, post a comment to that effect.

Step 4: refine the style (sparingly) with classes

Obviously a lot of style can be achieved by using just the structure and id values. However, another approach for binding styles to the markup is by using classes, which is done when the same styles are intended to be applied to a group of similar elements or sections. In looking first at how far I could get with just structure and identification, I added the following classes to the document:

  1. class=”slotgroup” was used to consistently style category sections such as “Also on MSN” and “Entertainment”.
  2. class=”slot” was used to consistently style individual modules consistently, such as “Today on MSN” and “MSNBC News”.
  3. class=”first” was used to be able to style the first item in lists consistently. For example, it allows me to style a list to appear horizontally with a bullet separator between all list items, without showing the bullet for the first item.
  4. class=”minorlist” was used to style the small list of links associated with the category.
  5. class=”imagelinks” was used to style image/link list sets such as the gossip section in the Entertainment category.
  6. class=”symbol” was used to style the stock symbol column in the Stock Market table.
  7. class=”currency” was used to style the numeric columns in the Stock Market table.
  8. class=”gain” was used to style content that suggests an increase or gain in value.
  9. class=”loss” was used to style content that suggests a decrease or loss in value.

For purposes of comparison, here’s a list of the 95 distinct class values used on the homepage today: adp, alsomsn, bbsw, bdy, bg, bgl, bgr, bgs, big4, blft, bt, btn, ccol, ccola, ccolb, chan, clt, cm, ct, currency, date, dbr, dgr, dMSNME_1, dot, dpp, drd, dt, dtl, dtr, ent, en-us, fd, first, gain, gcl, gq, hd, help, hilite, hrd, il, ilrc, imagelinks, lbl, lbl, lcol, left, local, loss, lt, mbl, md, mhome, mhr, minorlist, mon, msn_promo, nip, nippar, numl, numlb, or, PassportSignIn, popsrch, pr, ql, qlink, r, rcol, rcolb, rcolc, rf, right, rt, rt, sb, sfbtm, sfc, sfdp, sfl, sfp, shop, slot, slotgroup, sm, srch_btn, srchd_inp, stk, symbol, terms, termsattr, txt, txtad, and up.

Now which would you rather use? Hard to decipher values like “ccola” (not a soda) and “lowcal” (not a diet) or something more meaningful? h3 or class=”hd”? Targeting category colors by their id value, or by remembering the abbreviated class name for blue?

Tip: we all know that it’s important to use short values for things like class and id, but frankly, it makes understanding the document difficult. The good news is that when we take an approach that prefers identification over classes, we end up with a lot less classes, even to the point of tipping the reads-great/less-filling scale the other way! Instead of placing your emphasis on decreasing the length of your values, focus on decreasing your reliance on them in the first place.

Result

By just adding a very small number of classes, our styled document with classes gets very, very close to the desired visual presentation.

Prologue: what about legacy browsers?

As I mentioned earlier, legacy browsers requesting the page will still get the entire page of content (they’re not blocked), but it will not be styled with our desired appearance or scripted with our desired experience. Recognizing that not everyone uses a visual browser, we can easily provide tools that make this core experience much better (such as “skip nav” section). Recognizing that not everyone has a current browser, we can also provide information that explains the experience and gives the user options (such as the “why does the page look like this” section). The point here is that these enhancements (high-fidelity ones like style and script, as well as low-fidelity ones like skip nav and no-stylesheet sections) are all done in ways that are easily implemented and degrade gracefully. In an upcoming post, I’ll discuss ways of achieving this same type of easy fidelity for our scripts.

Next steps

So we’ve gone from invalid markup and style to being strict and valid, and lost a lot of weight, but not a lot of functionality or presentation flexibility in the process. We’ve also tremendously simplified our markup and style, and have a great cross-browser, cross-platform story without any conditional markup or targeted stylesheets. In part V, I’ll recap and provide a brief performance comparison between our new, light, Strict implementation and our earlier Transitional implementation (which is pretty close to the MSN.com site that is live today as I write this).


This post is closed to new comments.

About this page

This page contains a single post from Daniel Boerner's blog, of which Boot Camp + Windows Vista = no more Airport Extreme reboots is the latest post.

Are there more posts like this one?

Possibly. Within this blog, this post is categorized under work and webdev and it was posted on September 16, 2004. Those would be good places to start looking for related posts.

Next post (newer)

Previous post (older)