XML explained [June 11, 2012, 9:14 a.m.]

Before we start explaining XSLT we need to know what XML is.

What is a XML?

XML means eXtended Markup Language.

Let's show an example of a very simple XML:

<?xml version="1.0" encoding="UTF-8"?>
<hallo>world</hallo>

XML files usually start with a header which is the first line of this example. The minimal header only describes the xml version but I also included the encoding of the xml file. For encoding I always recommend at least UTF-8 which is Unicode because it is the best encoding if you also need to support other text-languages as English. But the more important part is the main XML contents:

<hallo>world</hallo>

An XML is mainly a structured tree with nodes. In this case we have two nodes which are a tag node and a text node. The tag (I just call it tag in the future) is named “hallo” and the text node is “world”. The text node is inside the tag! Keep that in mind: Text nodes only can be inside of tags.

So you maybe wonder why we see “hallo” two times. This is because XML files needs to be well formed or also called valid. The first <hallo> is the opening tag and </hallo> is the closing tag. A tag is always opened by <....> and closed by </....>. The name of a tag is defined by you or a specification that means if you write your very own XML you can name the tags as you want. If you write a XML which conforms to XHTML you are only allowed to use XHTML tags. In XML you are not allowed to just make a opening tag without a closing or a closing tag without a opening tag! Then the XML would be invalid. This is the main difference to HTML. HTML are not XML! Only exception: XHTML

Text nodes do not need an opening and end because they are the content of tags.

So let's look at the tree structure of XML. I will make a simple XML tree:

<?xml version="1.0" encoding="UTF-8"?>
<root><child1/><child2><child1of2/></child2></root>

Looks complicated? ;) The following XML is exactly the same. Whitespaces and linefeeds between already closed and opened tags are ignored:

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <child1/>
  <child2>
    <child1of2/>
  </child2>
</root>

Inside this XML tree there is no text node but we have something new: <..../>

This tags are empty tags and have no content. To make it clear this XML is again exactly (!) the same as above:

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <child1></child1>
  <child2>
    <child1of2></child1of2>
  </child2>
</root>

I hope you see the tree structure.

You can as a task create a XML and name it for example test.xml and open it with your browser.

If you write big XML files you maybe also want comments. Comments are usually ignored by XML interpreters:

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <child1/> <!-- I'm between child1 and child 2 -->
  <child2>
    <!-- It seems this comment is inside child 2 and before child1of2 -->
    <child1of2/>
  </child2>
</root>

Comments usually look this way:

Now let's introduce one more main element of XMLs: attributes.

Attributes are values stored in a tag but are not a seperate node. They are often used for optional data which should not be displayed where as text nodes are often used for content data.

Remember: Text, tags and comments are nodes. Attributes are no nodes.

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <child1 name=”Adam”/>
  <child2 name=”Eva”>
    <child1of2/>
  </child2>
  <!-- just a little comment -->
</root>

Attributes are only allowed in the starting tag and can be named as you want. I gavc them the name “name”