DOM tree

Domains: Javascript

The backbone of an HTML document are tags.

According to Document Object Model (DOM), every HTML-tag is an object. Nested tags are called "children" of the enclosing one.

The text inside a tag it is an object as well.

All these objects are accessible using JavaScript.

An example of DOM

For instance, let's explore the DOM for this document:

<!DOCTYPE HTML>
<html>
<head>
  <title>About elks</title>
</head>
<body>
  The truth about elks.
</body>
</html>

The DOM represents HTML as a tree structure of tags. Here's how it looks:

<script> let node1 = {"name":"HTML","nodeType":1,"children":[{"name":"HEAD","nodeType":1,"children":
[{"name":"#text","nodeType":3,"content":"\n "},{"name":"TITLE","nodeType":1,"children":
[{"name":"#text","nodeType":3,"content":"About elks"}]},{"name":"#text","nodeType":3,"content":"\n "}]},
{"name":"#text","nodeType":3,"content":"\n "},{"name":"BODY","nodeType":1,"children":
[{"name":"#text","nodeType":3,"content":"\n The truth about elks."}]}]} drawHtmlTree(node1, 'div.domtree', 690, 320); </script>

 On the picture above, you can click on element nodes and their children will open/collapse. 

Tags are called element nodes (or just elements). Nested tags become children of the enclosing ones. As a result we have a tree of elements: <html> is at the root, then <head> and <body> are its children, etc.

The text inside elements forms text nodes, labelled as #text. A text node contains only a string. It may not have children and is always a leaf of the tree.

For instance, the <title> tag has the text "About elks".

Please note the special characters in text nodes:

  • a newline: (in JavaScript known as \n)
  • a space:

Spaces and newlines -- are totally valid characters, they form text nodes and become a part of the DOM. So, for instance, in the example above the <head> tag contains some spaces before <title>, and that text becomes a #text node (it contains a newline and some spaces only).

There are only two top-level exclusions:

  1. Spaces and newlines before <head> are ignored for historical reasons,
  2. If we put something after </body>, then that is automatically moved inside the body, at the end, as the HTML spec requires that all content must be inside <body>. So there may be no spaces after </body>.

In other cases everything's straightforward -- if there are spaces (just like any character) in the document, then they become text nodes in DOM, and if we remove them, then there won't be any.

Here are no space-only text nodes:

<!DOCTYPE HTML>
<html><head><title>About elks</title></head><body>The truth about elks.</body></html>
<script> let node2 = {"name":"HTML","nodeType":1,"children":
[{"name":"HEAD","nodeType":1,"children":[{"name":"TITLE","nodeType":1,"children":[{"name":"#text","nodeType":3,"content":"About elks"}]}]},
{"name":"BODY","nodeType":1,"children":[{"name":"#text","nodeType":3,"content":"The truth about elks."}]}]}
 drawHtmlTree(node2, 'div.domtree', 690, 210); </script>

 Browser tools (to be covered soon) that work with DOM usually do not show spaces at the start/end of the text and empty text nodes (line-breaks) between tags.

That's because they are mainly used to decorate HTML, and do not affect how it is shown (in most cases).

On further DOM pictures we'll sometimes omit them where they are irrelevant, to keep things short.

Autocorrection

If the browser encounters malformed HTML, it automatically corrects it when making DOM.

For instance, the top tag is always <html>. Even if it doesn't exist in the document -- it will exist in the DOM, the browser will create it. The same goes for <body>.

As an example, if the HTML file is a single word "Hello", the browser will wrap it into <html> and <body>, add the required <head>, and the DOM will be:

<script> let node3 = {"name":"HTML","nodeType":1,"children":[{"name":"HEAD","nodeType":1,"children":[]},
{"name":"BODY","nodeType":1,"children":[{"name":"#text","nodeType":3,"content":"Hello"}]}]}
 drawHtmlTree(node3, 'div.domtree', 690, 150); </script>

While generating the DOM, browsers automatically process errors in the document, close tags and so on.

Such an "invalid" document:

<p>Hello
<li>Mom
<li>and
<li>Dad

...Will become a normal DOM, as the browser reads tags and restores the missing parts:

<script> let node4 = {"name":"HTML","nodeType":1,"children":
[{"name":"HEAD","nodeType":1,"children":[]},
{"name":"BODY","nodeType":1,"children":[{"name":"P","nodeType":1,"children":
[{"name":"#text","nodeType":3,"content":"Hello"}]},
{"name":"LI","nodeType":1,"children":[{"name":"#text","nodeType":3,"content":"Mom"}]},
{"name":"LI","nodeType":1,"children":[{"name":"#text","nodeType":3,"content":"and"}]},
{"name":"LI","nodeType":1,"children":[{"name":"#text","nodeType":3,"content":"Dad"}]}]}]}
drawHtmlTree(node4, 'div.domtree', 690, 360); </script>

Tables always have <tbody>

 An interesting "special case" is tables. By the DOM specification they must have ``, but HTML text may (officially) omit it. Then the browser creates `` in DOM automatically.

For the HTML:

<table id="table"><tr><td>1</td></tr></table>

DOM-structure will be:

<script> let node5 = {"name":"TABLE","nodeType":1,"children":
[{"name":"TBODY","nodeType":1,"children":
[{"name":"TR","nodeType":1,"children":
[{"name":"TD","nodeType":1,"children":
[{"name":"#text","nodeType":3,"content":"1"}]}]}]}]}; 
drawHtmlTree(node5, 'div.domtree', 600, 200); </script>

You see? The <tbody> appeared out of nowhere. You should keep this in mind while working with tables to avoid surprises.

O ther node types

Let's add more tags and a comment to the page:

<!DOCTYPE HTML>
<html>
<body>
  The truth about elks.
  <ol>
    <li>An elk is a smart</li>
*!*
    <!-- comment -->
*/!*
    <li>...and cunning animal!</li>
  </ol>
</body>
</html>

<div class="domtree"></div>
<script>
let node6 = {"name":"HTML","nodeType":1,"children":[{"name":"HEAD","nodeType":1,"children":[]},
{"name":"BODY","nodeType":1,"children":[{"name":"#text","nodeType":3,"content":"\n  The truth about elks.\n    "},
{"name":"OL","nodeType":1,"children":[{"name":"#text","nodeType":3,"content":"\n      "},
{"name":"LI","nodeType":1,"children":[{"name":"#text","nodeType":3,"content":"An elk is a smart"}]},
{"name":"#text","nodeType":3,"content":"\n      "},{"name":"#comment","nodeType":8,"content":"comment"},
{"name":"#text","nodeType":3,"content":"\n      "},{"name":"LI","nodeType":1,"children":
[{"name":"#text","nodeType":3,"content":"...and cunning animal!"}]},{"name":"#text","nodeType":3,"content":"\n    "}]},
{"name":"#text","nodeType":3,"content":"\n  \n"}]}]};
drawHtmlTree(node6, 'div.domtree', 690, 500);
</script>

Here we see a new tree node type - *comment node*, labeled as `#comment`.

We may think -- why a comment is added to the DOM? It doesn't affect the visual representation in any way. But there's a rule - if something's in HTML, then it also must be in the DOM tree.

Everything in HTML, even comments, becomes a part of the DOM.

Even the <!DOCTYPE...> directive at the very beginning of HTML is also a DOM node. It's in the DOM tree right before <html>. We are not going to touch that node, we even don't draw it on diagrams for that reason, but it's there.

The document object that represents the whole document is, formally, a DOM node as well.

There are 12 node types. In practice we usually work with 4 of them:

1. document - the "entry point" into DOM.
2. element nodes - HTML-tags, the tree building blocks.
3. text nodes - contain text.
4. comments - sometimes we can put the information there, it won't be shown, but JS can read it from the DOM.

See it for yourself

To see the DOM structure in real-time, try [Live DOM Viewer](http://software.hixie.ch/utilities/js/live-dom-viewer/). Just type in the document, and it will show up DOM at an instant.

In the browser inspector

Another way to explore the DOM is to use the browser developer tools. Actually, that's what we use when developing.

To do so, open the web-page [elks.html](elks.html), turn on the browser developer tools and switch to the Elements tab.

It should look like this:

You can see the DOM, click on elements, see their details and so on.

Please note that the DOM structure in developer tools is simplified. Text nodes are shown just as text. And there are no "blank" (space only) text nodes at all. That's fine, because most of the time we are interested in element nodes.

Clicking the <span class="devtools" style="background-position:-328px -124px"></span> button in the left-upper corner allows to choose a node from the webpage using a mouse (or other pointer devices) and "inspect" it (scroll to it in the Elements tab). This works great when we have a huge HTML page (and corresponding huge DOM) and would like to see the place of a particular element in it.

Another way to do it would be just right-clicking on a webpage and selecting "Inspect" in the context menu.

At the right part of the tools there are the following subtabs:

  • Styles - we can see CSS applied to the current element rule by rule, including built-in rules (gray). Almost everything can be edited in-place, including the dimensions/margins/paddings of the box below.
  • Computed - to see CSS applied to the element by property: for each property we can see a rule that gives it (including CSS inheritance and such).
  • Event Listeners - to see event listeners attached to DOM elements (we'll cover them in the next part of the tutorial).
  •  ...and so on.

The best way to study them is to click around. Most values are editable in-place.

Interaction with console

As we explore the DOM, we also may want to apply JavaScript to it. Like: get a node and run some code to modify it, to see how it looks. Here are few tips to travel between the Elements tab and the console.

  • Select the first <li> in the Elements tab.
  • Press key:Esc - it will open console right below the Elements tab.

Now the last selected element is available as $0, the previously selected is $1 etc.

We can run commands on them. For instance, $0.style.background = red makes the selected list item red, like this:

From the other side, if we're in console and have a variable referencing a DOM node, then we can use the command inspect(node) to see it in the Elements pane.

Or we can just output it in the console and explore "at-place", like document.body below:

That's for debugging purposes of course. From the next chapter on we'll access and modify DOM using JavaScript.

The browser developer tools are a great help in development: we can explore the DOM, try things and see what goes wrong.

Summary

An HTML/XML document is represented inside the browser as the DOM tree.

  • Tags become element nodes and form the structure.
  • Text becomes text nodes.
  • ...etc, everything in HTML has its place in DOM, even comments.

We can use developer tools to inspect DOM and modify it manually.

Here we covered the basics, the most used and important actions to start with. There's an extensive documentation about Chrome Developer Tools at https://developers.google.com/web/tools/chrome-devtools. The best way to learn the tools is to click here and there, read menus: most options are obvious. Later, when you know them in general, read the docs and pick up the rest.

DOM nodes have properties and methods that allow to travel between them, modify, move around the page and more. We'll get down to them in the next chapters.

Similar pages

Page structure
Terms

Document Object Model (DOM)

JavaScript

Objects

Comments

"while"

Data properties

Class

Variables in JS

Interaction

Switch statement

String

Date and time

Try..catch