Writing webpages (Tim's webpage authoring guide)

Very basic HTML

Start off with writing what you want to say, in a text editor (whether that be a plain text editor, a dedicated HTML editor, or a word processor that lets you save plain text files—though realise that most word processors save files in their own specalised format until you choose to “save as” something else), getting your “content” prepared, before you try and do anything with it.

I strongly recommend using a plain text editor, and writing the HTML by yourself. You'll learn more that way, and you won't have to fight with a badly designed HTML editor (many are) that creates incorrect HTML, that you'll have to fix up.

Make things simple for yourself, and start off with simple paragraphs; nothing more. Don't do any formatting (styles, fonts, etc.), and avoid any special characters (i.e. stick to plain letters and numbers, no symbols other than what you find on ye olde typewriters), just type each paragraph as a solid block, with a blank line between each paragraph. (Later on, when you know more about what you're doing, how HTML works, and how many computers make a complete hash of it, you can use proper typographical symbols in documents, as described in a page I wrote as part of my desktop publishing guide—a completely separate guide from this web authoring guide.)

Only press “return” when you finish a paragraph. If you're using an HTML editor it should be automatically handling inserting the HTML for you, and you should only press return once at the end of each paragraph (assuming a sensibly working editor, some are quite bad and can't even manage to get the simple concept of paragraphs correct). If you're using a plain text editor then press return twice; this gives you extra space (a blank line) to hand insert HTML, later on.

Edit your material (correcting mistakes, etc.) now, when it's easy to do so, without having to worry about the technical details of HTML. And don't worry about how it looks, that'll be taken care of later—you have to realise that you can only loosely control how the page will end up looking like, anyway.

It's easier to automatically spell check plain text documents, unless you're using an HTML editor, because they'll choke on all the HTML in the file.

Once you've typed a page, or a significant portion of a page, you can then insert HTML in the appropriate places (assuming that you're using a plain text editor, which I'm going to continue assuming for most of these instructions). Start by putting opening and closing “p” (paragraph) tags, at the start and end of each paragraph. This should be enough to be able to load the page in a web browser, which will assume that the other missing (as yet untyped) HTML tags are where they're supposed to be. Later on, or now, you can add in the rest. But even without them you've got a basic HTML page—albeit one that's not a specificly defined version of HTML—although it really needs a “title” element, too (the title won't be seen on the page, but elsewhere in the browser; such as the browser window title bar, and entries in your list of bookmarks, etc.).

For example, turn the following:

This is an example paragraph of text, it's not that long, because I
    can't be bothered with typing a lot of text, and I don't want to make
    this page very very long.  I'll put another paragraph of text below
    it, so you can see at least a couple of paragraphs, with the surrounding
    HTML tags.

    

    This is a second paragraph, and I'll make this one really short, just for
    the sake of it.

Into:

<title>An example page</title>

    

    <p>This is an example paragraph of text, it's not that long,
    because I can't be bothered with typing a lot of text, and I don't want
    to make this page very very long.  I'll put another paragraph of
    text below it, so you can see at least a couple of paragraphs, with the
    surrounding HTML tags.</p>

    

    <p>This is a second paragraph, and I'll make this one really short,
    just for the sake of it.</p>

Then save it as a file with a .html suffix (e.g. example.html), and open the file in a web browser. You should now be able to see the page in a web browser, and be able to work out how to add more paragraphs by yourself. You should also be able to see that the page rendering is not laid out by how you typed the words, but where the HTML tags have been placed (where the ends of the lines of typed text are, and spaces between blocks of text, don't correlate between the source file and the rendered page).

Note that although some multi-format editors will choose the format you save a document in by the filename you save it as, simply “renaming” some file (e.g. example.doc) to a “.html” suffixed filename, does not “convert” the file to HTML, you must “save” it in the appropriate format.

Also note that certain characters are not allowed in web addresses (e.g. blank spaces are forbidden), or are difficult to use. To make life easy, stick to plain alphabetical and numerical symbols, as used on old fashioned typewriters (don't use accented, or special symbols and characters), when naming files and directories for the World Wide Web.

Do not forget that web addresses are case sensitive (upper and lower case letters are treated as different things, even if they're the same letter), so write all your filenames and link addresses identically. There's a convention of always using lower case letters to avoid accidents. It's a simple and effective approach, one that I'd recommend following, unless you've got a good reason to do otherwise.

Now, for the sake of removing any ambiguity about what's what in the document, and to ensure that the page is recognised as being an HTML document, we'll add in the other HTML tags that we previously omitted (purely for keeping the example simple, earlier on).

Insert the following around the HTML that you've already played with (you can cut and paste from this page, or type it yourself); noting that the “title“ element is inside the “head” element, your paragraphs are inside the “body” element, and the whole lot's inside an HTML element (rather like the Russian “Babushka” dolls, with one inside the other). Like this:

<html>

    

    <head>

    <title>An example page</title>

    </head>

    

    <body>

    

    <p>This is an example paragraph of text, it's not that long,
    because I can't be bothered with typing a lot of text, and I don't want
    to make this page very very long.  I'll put another paragraph of
    text below it, so you can see at least a couple of paragraphs, with the
    surrounding HTML tags.</p>

    

    <p>This is a second paragraph, and I'll make this one really short,
    just for the sake of it.</p>

    

    </body>

    

    </html>

Note that the extra blank spaces aren't necessary, but they make it clearer for you to read. Also note that removing the blank spaces does not make the page more efficient, the small decrease in file size is insignificant.

At this point you've got a very simple HTML document that should work in all browsers. It's customary for pages to have at least one heading (a main heading), and may have several sub-headings. So, the next step will be to add a heading to the top of your page (inside the “body” element, just above your first paragraph).

A few “simple” HTML elements

Most people want to write more than just plain paragraphs, so I'll discuss a few simple things that you can easily add to your webpages. HTML is used to give “meaning” to the content, this is called “marking-up” the text. Some of the marking up isn't immediately obvious, though can still be beneficial (e.g. indicating abbreviated words). Though much of the marking up has very obvious effects (headings, tables, lists, etc.).

Page title

The page title is one of the few elements which generally isn't shown on the page, it's usually shown elsewhere on the browser window, used when bookmarking pages, and by search engines.

It's a mandatory element (that means that you MUST include one), and you should use an appropriate one. It goes into the document head, starts with an opening “title” tag, ends with a closing “title” tag, with the title between them.

e.g.

<title>Writing
    webpages</title>

Headings

Headings are marked-up using “h” tags around them, to form heading elements, starting with “h1” for the first (main) heading, working down to “h6” for sub-headings. Although you many notice that some browsers display those different sub-headings at different sizes, do not use them as a way to control the size of the headings, nor as a way to get bigger text on the page; they're for “headings”. Start with “h1”, working up the numbers as you create sub-headings of sub-headings, in sequence. Later on, you can play with the sizing using page styling, if you want to, (although, in either case, different people's browsers may use different sizes, than you expect, anyway). But the numbers refer to the status level of the heading.

For example, insert the following after the opening “body” tag, and before your first paragraph:

<h1>Teaching yourself HTML</h1>

Then, if you add other sections to your page, insert sub-headings between them. If the sub-headings, are sub-headings to the main heading, then make them “h2” headings. When you make a sub-heading of a sub-heading, make it one number higher. There should only be one h1 element per page, all other headings are sub-headings.

Representation of how headings and sub-headings relate to each other:

<h1>Pets</h1>

      

      <p>This document discusses different types of pets, that people
      commonly keep.</p>

    <h2>Dogs</h2>

        

            <p>Dogs are
        annoying furry animals.</p>

    <h2>Cats</h2>

        

            <p>Cats are
        furry animals, that are annoyed with other animals.</p>

        <h3>Caring
          for your cat</h3>

          

                  <p>This
          is a full time occupation.</p>

        <h3>Feeding
          your cat</h3>

          

                  <p>This
          is another full time occupation, even when it's been feeding itself
          on the native wildlife.</p>

    <h2>Pet
        rocks</h2>

        

            <p>By far
        the easiest pet to keep.</p>

    <h2>Windows</h2>


        

            <p>Demands
        lots of attention, is suicidal, causes great anxiety in all those
        around it, spreads disease (no known protective
        agent).</p>

You should be able to see how some of the above sub-sections were sub-sections to the main part, and others were sub-sections of sub-sections. I've indented sub-sections, to emphasise where they fit in, just for this example.

Not only does properly using headings, rather than playing tricks with fonts, etc., give meaning to the page, which browsers and other agents can assess (e.g. search engines), it also means that the author can use the information for things like generating tables of contents, for a website (e.g. by having the computer assess each page, making links to any heading with an “id” using the heading as the words in the link). Not only does this make it easy to generate the table of contents page, but it also makes it easy to update it if the site's contents are modified.

Paragraphs

Most documents have text that's separated into paragraphs, and there must be some way of specifying where they begin and end. HTML doesn't care where you type blank spaces and carriage returns in a document, they're all considered as being a single blank space, even when you type several of them in a row. This is a design feature, the HTML is supposed to structure the page to suit the displaying situation. You can see this in action by resizing your browser window, the text will reflow to fit into the available horizontal space, rather than disappear off the margin.

Paragraphs are “marked-up” with an opening “p” tag, and a closing one, around the paragraph.

e.g.

<p>This is an example
    paragraph.</p>

Seeing as multiple blank spaces are regarded as only being one blank space you strike a problem when trying to type a document in the traditional manner of having two blank spaces after a full stop (to aid reading). If you want to do that, you'll need to put a non-breaking space character after the full stop (this will be regarded as a significant character, rather than unimportant white space), then a blank space between it and the next word. Don't type two non-breaking spaces in a row (the browser needs to be able to break lines between sentences, and the first non-breaking space is enough to get two spaces between sentences; you'll get a non-breaking space, followed by a normal space). And don't type a blank space after a full stop then a non-breaking space (the line would break at the blank space, and the next line would start indented by the non-breaking space).

e.g.

<p>This is a sentence.&nbsp; This is
    another sentence.</p>

If you want to use traditional indented paragraphs, rather than blocked ones (as most HTML documents are rendered), then you'll want to play with styling, not more non-breaking spaces (apart from being the wrong approach, you'd still be stuck with blank space between each paragraph).

The non-breaking space is a space between characters that will not get broken by the end of a line, if the line happened to end where that character was (the line break will occur in a different place). This is useful for situtations where things are easier to read without getting broken across a line, make no sense if they do get broken apart, or would look bad (e.g. like if this bracketed information ended a line just after the “e.g.” characters, with the rest on the following line).

You can enter a non-breaking space into a document directly, if you know how to do that (how you manage it would depend on whatever program that you were using to type your document with, but it might by CTRL and space together, or ALT and space together). Alternatively, you can use the   character entity, or the   numerical reference (they're the same thing, just stated in different ways).

Line breaks

There are times when you need to break apart lines of text (or other things), but the break has nothing to do with paragraphs. For these situations, there's the “br” line break element:

e.g.

This is something.<br>This is the next
    thing.

The “br” element is one of the “empty elements”, it doesn't have any content. You don't put opening and closing “br” elements around content, you just type a single “br” element where you want a line break.

Conversely, there are times when you wish to avoid line breaks. Unfortunately, there isn't any really good way to do this.

Using the non-breaking space (previously mentioned in the paragraphs section), is only partially effective, as some browsers will still break lines at punctuation, and sometimes in very stupid places, too.

There's an unofficial “nobr” (no break) element that most browsers will understand, but it isn't a part of any formal HTML specification (you'd be relying on browsers supporting something that's unofficial, though most browsers would support it; and those that don't, should ignore it *).

e.g.

A <nobr>double-barrelled</nobr> word
    that you don't want broken apart

The browser mightn't place line breaks in the page content between the opening and closing "nobr" tags.

* The HTML specifications say that browsers should ignore tags for any element that they don't understand, and render the contents as if the tags weren't there. For instance, if a browser didn't understand the “nobr” element used in the prior example, it should behave as if the opening and closing “nobr” tags were never written in the middle of that sentence (i.e. be treated simply as “A double-barrelled word that you don't want broken apart.”).

There is a “proper” way to do this, using the “white-space” CSS property, however browser support for CSS is still in its infancy.

CSS:	`.keeptogether {white-space: nowrap;}`
HTML:	`A <span class="keeptogether">double-barreled</span> word that you don't want broken apart`

A CSS rule is set to define a class for keeping words together (I've named it “keeptogether”, but the class could be “named” differently, I just decided to use something that's sensibly obvious), and that class is used in any HTML element where you'd like to avoid any line breaks (the “span” element is a ”generic“ element for marking up content in the middle of another element, it has no special meaning of its own).

Be careful when playing with unbroken lines that you don't make terribly long lines of text (just use it around small items). Else you can make pages that are very hard to read, where things just keep going past the right-hand margin.

Divisions

Not all text on a page is a paragraph, nor is all content text, so there's a division element to segregate content, without lying that they're paragraphs:

e.g. <div>This is an example.</div>

Normally there is no blank space inserted between divisons, a new division starts immediately below the last one.

It's important to realise the semantic difference between paragraph breaks, line breaks, and divisions. They all have different meanings, and it's the meaning of the elements that you use that's important. The visual effect that they each have is a side effect. Use the right ones, for the right purposes; and if you need to change the look that any of them have, then use styling to customise it.

Abbreviations

HTML allows abbreviations to be specially marked up as abbreviations, this can be beneficial in several ways:

Machine assessment of content, so it can (more easily) index special terms.
Aural browsers have more information to use for better pronounciation.
You can look for abbreviations, and maybe find definitions.
Abbreviated terms can provide an explanation for themselves, without it having to be displayed all the time.

e.g.

<abbr title="World Wide
    Web">WWW</abbr>

WWW is indicated as being an abbreviation, and the unabbreviated form has been supplied in the title attribute. The browser may have some way of indicating the word is an abbreviation, and showing you the extra information (such as hovering the mouse over the word).

Browser support for this isn't too bad, although Microsoft's Internet Explorer is very poor about it (amongst it's many other deficiencies). But there's a long standing problem with the HTML specification's definition of abbreviations and acronyms (regarding which should be spoken as a word, or spelt out), which has never been properly resolved. CSS can be used, in addition to HTML, to suggest that something should be spelt out, but there's no converse hint to suggest that something should be read out.

Emphasising words

From time to time, you'll want to emphasise a word (or more) in a paragraph, much the same way as how you'd naturally speak a sentence. There's two HTML elements for doing this, the “em” (emphasise) and “strong” (strong emphasis) elements. They bracket the words, with opening and closing tags indicating where to start and stop emphasising the content, giving extra meaning to the data, which can affect how they're displayed or read out loud (for aural browsing), and any machine assessment of the data.

e.g.

I <strong>strongly</strong> recommend
    that you read <em>this</em>!

Visual browsers commonly italicise “emphasised” text, and bolden “strongly emphasised” text; though that's not mandatory behaviour (don't rely on that, nor misuse it to style a word that doesn't need emphasising). Aural browsers may speak such emphasised words in a louder voice, or placing some other form of stress on the word.

Styling words

Although it's recommended practice to use CSS rather than HTML to add style to a page, there's a few HTML styling elements that are still useful, and are just as well done using HTML instead of CSS. It also means that such styles will remain with a page, even if an associated CSS is lost (like when someone saves a simple copy of a webpage). There are elements to make text italicised (the “i” element), boldened (the “b” element), underlined (the “u” element), or struck-out (the “s” or “strike” elements). These also bracket the words to be styled, in the same way that any element tags are typed, but they don't give any particular meaning to the words. they're useful for when it's customary to type some things in a certain way (e.g. italicising the scientific name of something, or a foreign word, in the middle of a paragraph).

Styling text examples:
HTML source:	`<i>Italicised</i>, <b>boldened</b>, and <u>underlined</u>, <s>struck</s>, or <strike>striked</strike>.`
HTML output:	Italicised, boldened, underlined, struck, or striked.

Understand that:

Italicising a word is not the same as emphasising it; regardless of whether it looks the same in your browser. Likewise, with boldening a word.
Underlining words can confuse people into thinking that it's a link, and frustrate them when they can't follow it (links are usually underlined on most browsers).
Striked text, apart from being hard to read, merely means that it looks like it's been struck out (such as marking off things that you've bought on your shopping list), stuck-out text has no particular defined meaning (it's only a “visual effect”).

There's another HTML element to deal with text that's been “deleted”, it's the “del” element; and it has a counterpart for “inserted” text, the “ins” element. They “define” the marked-up information as being deleted or inserted (they're useful for things like corrections to documents, where you still need to see the original information, and know what's been added to replace it).

Again, don't misuse “del” to strike out text that isn't deleted, use the right elements for the right purposes.
Doing something that looks like something, rather than doing it properly so that it actually means something, is cheating and misleading. It shows that you can't be bothered to put the proper effort in, and means that people will not be able to use the data fully, and is just plain wrong.
Some of these elements (for underlining and striking) have been removed from newer and stricter versions of HTML, to be done using CSS “text-decoration” rules instead instead, as they're styling effects not structural mark-up. The others (for italicising and boldening) text should also have been removed from HTML, but for some reason they haven't yet, although “font-style” rules for them already exist in CSS.

Lists

Sometimes you want to present a list of items. This can be easily done using list elements, with either numbers (ordered lists) or symbols (unordered lists), before each list item. Which type you use depends on the type of information you're presenting. Attempting to create what looks like a list without using list elements is prone to failure, particularly in regards to how the text wraps across the page.

Each item, in the list, is bracketed with opening and closing “li” (List Item) tags, and the entire list is bracketed with opening and closing “ol” tags for Ordered Lists, or “ul” tags for Unordered Lists (the browser inserts the numbers, itself, for ordered lists).

e.g.

<ol>
   <li>Item number one.</li>
   <li>Item number two.</li>
</ol>

<ul>
   <li>The first item on the list.</li>
   <li>The second item on the list.</li>
</ul>

<p>And some example text, just for the sake of it.</p>

Notes:

Lists are “block” objects, they go into the body of the document, rather like a paragraph does. They can only go into other elements which can accept block objects (i.e. not into the middle of a paragraph).
Only the “li” element can go directly inside “ol” or “ul” elements, everything else in a list must be placed inside the “li” elements.
Certain other block elements may be placed inside “li” elements. But before you go making complex pages, with things nested one inside another, you should study the HTML specifications thoroughly, understand what you're doing, and understand that some browsers have problems with pages that have things nested inside things several times over.
It's common for authors to want extra spacing between each list item, as per the spacing between paragraphs, because they're putting a paragraph into each list item. If that is what you're doing, then put that paragraph inside paragraph tags, inside the list item tags; don't mess around with putting bogus line breaks at the end of the list item.

Links

Many people want to add links to other pages on theirs, either to more of their own pages, or to other websites, so here's how to go about it: You use an “a” element, with the address written into a “href” attribute inside the opening “a” tag, with the message that shows up as the link (what you would click on with a mouse) in between the opening and closing “a“ tags.

e.g.

<a href="http://www.example.com/">visit the
    example website<a>

In that example, the website “referenced” in the “href attribute”, is where the link will take you, and the text “visit the example website”, is what will be rendered on the page as the link.

The message used as the link text (the prompt), can be pretty much anything that you can put into an HTML page (it doesn't have to be text, it could be another element, such as an image). Many of the HTML elements can be used between the opening and closing “a” tags; though not all elements are suitable for inclusion there (usually the ones that can't go inside a paragraph, either). You can use an image, instead of text, as the prompt, simply by putting an “img” element between the “a” tags, instead. Though, always remember that whatever you put in there, must be totally inside the “a” tags; it cannot overlap (proper “nesting” must be maintained).

If you're hoping to have your pages indexed by search engines, then ensure that the prompt between the “a” tags is a suitable description for where the link goes. This prompt will be used as part of the information used to index the site. “Click here” links are useless, in that regard, and look really stupid. By way of example, when you use non-web technology, like the controls on a TV set (for instance), they aren't labelled “press here for channel 2”, or “turn right for more volume”, they're labelled “Channel 2” and “Volume”. The user already knows what to do with the gadgets, they just need to know what each one's for.

The address the link refers to, has to be written in a manner suitable for where the link is. If you're linking to another file, in the same directory as the page, then you can simply write the filename in there.

e.g.

<a href="page-two.html">next
    page<a>

Note that it may not be necessary to include the filename suffix in URIs. Some servers can provide different types of files just by using the common part of the name ("page-two"), depending on what's available at the server, and what best suits the browser. Also, this means that you can change document formats, without having to rewrite link addresses, and let the server find the right file for you (e.g. you might write HTML documents today, but next year start writing XML documents, and any other page that simply refers to "page-two" will automatically get page two, whether it's "page-two.html" or "page-two.xml", you won't have to rewrite all the link addresses to change them from ".html" to ".xml" suffixes).

If you're linking to a resource on another server, you have to write the full address to it (including the http:// protocol prefix).

e.g.

<a
    href="http://www.example.com/help-page.html">help page<a>

If you wanted to link to http://www.example.com/ and get whatever page they serve you (their default page) when you don't ask for a specific resource (like the help page, in the above example, is a specific resource), then you'd just write the base address, without any particular page reference after it (omitting the help-page.html portion). Just the same as how you type website addresses into your browser's address gadget.

URIs (WWW addresses) cannot have blank spaces in them, so any files or directories that you create shouldn't use blank spaces in their names. It is possible to encode a space into a URI (as %20), but the whole situation is messy. Don't do it.

Links can also point to places within a page, so long as that “place” has an “anchor” written in it (something acting as a “marker”). Anchors can be made by giving an “id” attribute to an element (some browsers don't support this too well), or putting a “name” attribute to an anchor (an “a” element) that's placed around something. You link to those anchors, by using a link with a “fragment identifier” written after a hash symbol (the “fragment identifier” is the “name” or “id” that was used). If the link points to place on the same page, them you only need to specify the fragment identifier (see example 1, below); if the link is to another page, then you place the fragment identifier after the address to that page (see example 2, below); if the link is to a page on another website, then write the fragment identifier after the entire address to the page (see example 3, below).

Example 1 (anchors on the same page)
Anchors:	`<p id="important-info">This paragraph is very <a name="very-important">important</a>, you should read it!</p>`
Links:	`<p>Read the important <a href="#important-info">paragraph</a>, or jump straight to the <a href="#very-important">specific thing</a> that's important in it.</p>`

In the above example, the entire paragraph has been “id'd” (identified) as “important-info”, and the word “important” has been “named” as “very-important”, allowing the entire paragraph, and a particular word, to be located within the page, and directly linked to.

Example 2 (anchors on another page)

<p>Read the important information, on the

      <a href="help.html#important-info">help
      page</a>.</p>

Example 3 (anchors on another website)

<p>Read about the important information, on the

      <a href="http://www.example.com/help.html#important-info">other
      website</a>.</p>

Links can also be written into the document head element that have certain relationships to the current page (such as the next page, the previous page, the starting page, etc.), that can be used by some user-agents to put an order to a collection of pages. Some browsers provide an additional set of navigational buttons to use these links (some browsers will just ignore this extra information). These links don't form a part of a page, they provide extra information for the page (that's why they're in the head, rather than the body).

Relational links examples

<head>

      <title>An example page</title>

      <link rel="previous" href="./page-styling.html">

      <link rel="next" href="./webpage-tricks.html">

      </head>

When linking to other pages (or files, or “resources”), you have to take into account where the other resource is located. If it's in the same directory as the current page, then your links only need specify the name of that resource. But if the other resource is in another directory on the same server, then you need specify the path to that directory (see below). And if the other resource is on another website, then you need to specify the entire address (as per the above example number 3).

Specifying paths between directories

Paths are best thought of as a “directory” (a listing of files), that is layed out in the manner of a family tree. Where things branch out from one location, to another.

Children directories

These refer to directories that are sub-directories of the current location (ones inside it). Simply write the directory name, followed by a slash, followed by the file name:

e.g. The path to a resource called “dest.txt” within a directory called “red” which is inside the current directory is "red/dest.html"

Parent directories

These refer to the directory that a directory resides inside. Simply use a dot-dot-slash sequence before the name of the resource you're linking to:

e.g. The path back out of the “red” directory to its parent directory, to a file called “was.html” is "../was.html"

You can go back more through several parent directories, by repeating the dot dot slash sequence, for each one.

e.g. "../../../example.html"

Note: Previously, I've mentioned that you cannot use the ampersand character without due care to how you use it. If you write a link that includes one in the address, you must write it as & instead of just as & by itself, else you'll be creating an HTML error (web addresses can have ampersands in them, it's just that you can't always directly write an ampersand into HTML documents); and although some browsers may correct that for you, you should not rely on that behaviour. This is an HTML issue, the browser will request the right resource (it'll make the request as if the address written with just an ampersand, there).

e.g. Like this: http://www.example.com/find.cgi?cats&dogs

However, before escaping an ampersand, check that it's not already a character entity code, as part of the address, so you don't end up escaping it twice over (breaking it).

e.g. Whoops: http://www.example.com/find.cgi?cats&amp;dogs

Images

Including an image on a page means getting another file from a server, and the browser fitting it into the page (images are “in-line” objects, being included between other things, even in the middle of a paragraph between words). This means three things:

The page and image are separate things, used together.
The browser will fit surrounding text around the image, as fits best in the current windows size.
Some people may not “get” the image (for various reasons), and you need to provide a suitable alternative, for that situation.

Taking that step by step:

You have an HTML page and an image file on the webserver, both or which have to be put there, and you have to write your HTML so that links to the image work correctly.
You must consider how the image will fit into the page, where the best place to include it is, and methods of containing text or images if you need to fit things together in a specific way (noting that you do not have rigid control over the matter).
Image elements have an “alt” (alternate) text attribute, which is displayed alternately to the picture (e.g. perhaps while the picture is loading, or when the picture is not loaded, or when the picture cannot be loaded, etc.).

The bare minimum of attributes in the “img” (image) element are the “src” (source) address for the image, and “alt” (alternate) text.

e.g.

<img src="ball.jpeg" alt="beach
    ball">

The alternate text is now a mandatory requirement, enabling people to understand a page even when images aren't showing. As such, you should write your alternate text, appropriately. If the image is inserted into the middle of something, where the alternate text might make for nonsensical reading, be sure to write the alternate text in a manner that fits in with it. The easiest way to get it right is to read the page out loud in the order the content is presented, both with and without the alternate text.

<p>While at the beach, we played with a <img
      src="ball.jpeg" alt="beach ball">.</p>

      

      <p>Here's a photo of us playing with it:  <img
      src="beach.jpeg" alt="(Beach photograph.)">.</p>

However, if your image is something which the reader does not need to know about if it's not showing (e.g. it forms part of some optional decoration on the page, and images are not being loaded in the browser), then you'd set the alternate text to not contain anything; this avoids making the page confusing to read when images aren't present.

e.g. <img src="page-border.jpeg" alt="">

If you simply omit the alternate text attribute, which is an “error”, some browsers will display a prompt to show that an image was part of the page, to allow the viewer to do something about viewing it (download it manually, for instance). This makes a page awkward to read, don't do that.

Since the source attribute is an “address” to where the image is located, you have to specify it in a proper manner. If the image is in the same directory as the page, then you can simply write the filename for the image, as per the above examples. If the image is located on another server, then you'll need to write the entire address to the image, in the source attribute; including the http:// protocol prefix.

e.g.

<img src="http://www.example.com/beach.jpeg"
    alt="A picture at the beach.">

Before you go nuts cluttering a page with images, consider a few things:

It makes a page take longer to load.
Only so-many connections can be made to a server at once. If you go over the limit, connections may be queued to take their turn, or simply aborted.
Some browsers will not display a page until all the images have loaded. If one, or more, of your images fails to load, the page may not display, at all.
Images take memory to display, some people's computers can't handle a large number of images (or large images).
Your webserver may only allow you to serve so-many bytes per month, adding images will use up your allowance quicker.

Note: Images are a “link” to the image file to be included with the page. As such, they also must follow the rule, previously mentioned, about escaping ampersand characters as & instead of writing it as just & (by itself). Likewise, the rules for other special characters have to be followed (avoiding some, encoding others).

Making more complex pages

Once you've grasped the basics, understanding the structure of the page, how to insert elements, and how to use those elements, you should be able to refer to the official HTML specifications to find out about the other elements that you can use, working out for yourself how to do anything further than you've seen on this page. Likewise, regarding applying styling.

You may want to view the source code of these documents to see how they were written; likewise, with other pages on the WWW. However, realise that many webpage are badly authored, and you could be copying bad examples. You're far better off using the proper specifications, rather than someone's second-hand version of how to do it. Anything fancier than outlined here really requires you to understand more technical information, so you're going to have to start reading, and learning, the harder stuff. I've only intended this guide to be a primer, to make it easier to follow the specifications.

Using proper typography

One of the nice aspects of HTML is that allows us to use more than just the very basic characters in the ASCII character set. This means things like proper punctuation and foreign characters (or your own characters, for non-English authors). Much of what we read on the internet isn't written properly, and I don't just mean the poor spelling and grammar. It's only applications with limited typesetting abilities that need to use the poor substitutes for proper quotation marks, dashes, and other symbols. The proper ones can be used in HTML; and, by now, most browsers will support displaying them.

I've already written a fairly lengthy page on proper typography in my desktop publishing guide, so I'll refer you to that page, for all the gory details about how and where to use them, rather than duplicate all that information here. But I'll briefly outline the HTML aspects of using them, here.

With suitable software, perhaps a fancier keyboard, and properly authored and served pages, you can directly type the symbols into the page. But many of us aren't in that position, so we can use character references to insert them into the page (some common ones are listed below). I will point that out that if you do directly insert the characters into a page, either by typing them, or converting character references directly into those characters, it's imperative that your pages properly identify the character set that they're using, as some applications insert them in different, incompatible, ways (e.g. the usual Windows character set uses a different encoding, for some of them, than others do).

Some character references
Symbol	Name	Entity
“	left double-quote	“
”	right double-quote	”
‘	left single-quote	‘
’	right single-quote	’
—	EM dash	—
–	EN dash	–
…	horizontal ellipsis	…
©	copyright symbol	©
™	trademark logo	™
®	registered trademark logo	®

Which version of HTML to use

Over the years, various versions of HTML have evolved, and it confuses some people as to which they should use. As general advice, I'd say to use “strict” HTML 4.01. It's understood by just about all browsers in existence, isn't incompatible with older browsers, is quite well defined, and fairly well described. Older versions have some serious flaws, transitional versions aren't really needed anymore, newer versions aren't well supported yet (MSIE can't even browse properly served XHTML), and don't really offer anything that HTML 4.01 can't already do.

Just so you know, terms like DHTML don't refer to any “standard”. It's a meaningless buzz-word for dynamic HTML (pages that change their content, or rendering, using some form of scripting). It's generally a grotty mixture of badly written HTML and JavaScript that doesn't work very well on different browsers—including the same brand of browser as the author used—because there are no proper, nor even widely-compatible, way of doing such things.

Homepage, computing, web authoring guide: contents, glossary, index, previous page, next page.