This course will become read-only in the near future. Tell us at community.p2pu.org if that is a problem.

Task 1: XML explained : Create some XML


Before we start explaining XSLT we need to know what XML is.

What is XML?

 

XML means eXtended Markup Language.

Let's show an example of a very simple XML document:

<?xml version="1.0" encoding="UTF-8"?>
<hello>world</hello>

Headers

XML files usually start with a header which is the first line of this example. The minimal header only describes the xml version but I also included the encoding of the xml file. For encoding I always recommend at least UTF-8 which is Unicode because it is the best encoding if you also need to support other text-languages than English. But the more important part is the main XML contents:

<hello>world</hello>

XML is mainly a structured tree with nodes. In this case we have two nodes which are a tag node and a text node. The tag (I jwill ust call it tag in the future) is named “hello” and the text node is “world”. The text node is inside the tag! Keep that in mind: Text nodes can only be inside of tags. 

Opening and Closing Tags

So maybe you are wondering why we see “hello” two times. This is because XML files needs to be well-formed. The first <hello> is the opening tag and </hello> is the closing tag. A tag is always opened by <....> and closed by </....>. The name of a tag is defined by you or by a specification. That means if you write your very own XML you can name the tags the way you want. If you write XML which conforms to the XHTML specification, you are only allowed to use XHTML tags. In XML you are not allowed to just make a opening tag without a closing tage or a closing tag without an opening tag! Then the XML would not be well-formed. This is the main difference between XML and HTML. HTML is not XML, because it doesn't always close tags! But there is a form of HTML that is XML: XHTML. 

Text nodes do not need an opening and end because they are the content of tags.

Valid XML

When speaking of valid XML we also need to keep sure that the first very tag, which is called document tag encloses all other tags in the document. In the XHTML specification this would be the first <HTML> tag.

 

This is for example invalid XML:
<?xml version="1.0" encoding="UTF-8"?>
<mydocument1/>
<something_else_which_is_invalid_xml_because_it_is_after_the_first_document_tag/>

 

Simple XML trees

 
So let's look at the tree structure of valid XML. I will make a simple XML tree:
 
<?xml version="1.0" encoding="UTF-8"?>
<mydocument><child1/><child2><child1of2/></child2></mydocument>

Looks complicated? ;) The following XML is exactly the same. Whitespaces and linefeeds between already closed and opened tags are ignored:

<?xml version="1.0" encoding="UTF-8"?>
<mydocument>
  <child1/>
  <child2>
    <child1of2/>
  </child2>
</mydocument>

Inside this XML tree there is no text node, but we have something new: <..../>

Some of the tags are empty tags and have no content. Those tags close themselves. To make it clear, both XML examples above are exactly (!) the same as the one below:

<?xml version="1.0" encoding="UTF-8"?>
<mydocument>
  <child1></child1>
  <child2>
    <child1of2></child1of2>
  </child2>
</mydocument>

I hope you see the tree structure.

Your task:  Create an XML file and name it (for example test.xml) and then open it with your browser.

Comments

If you write long XML files you maybe also want comments. Comments are usually ignored by XML interpreters:

<?xml version="1.0" encoding="UTF-8"?>
<mydocument>
  <child1/> <!-- I'm between child1 and child 2 -->
  <child2>
    <!-- It seems this comment is inside child 2 and before child1of2 -->
    <child1of2/>
  </child2>
</mydocument>

Comments usually look this way: <!-- … -->

Attributes

Now let's introduce one more main element of XML: attributes.

Attributes are values stored in a tag, but they are not a seperate node. They are often used for optional data which should not be displayed, where as text nodes are often used for content data that is displayed.

Remember: Text, tags and comments are nodes. Attributes are not nodes.

<?xml version="1.0" encoding="UTF-8"?>
<mydocument>
  <child1 name=”Adam”/>
  <child2 name=”Eva”>
    <child1of2/>
  </child2>
  <!-- just a little comment -->
</mydocument>

Attributes are only allowed in the starting tag and can be named as you want. I gave them the name “name”.

Task:

Create an xml file with a header, a root tag, child tags, attributes, text nodes and comments.

Task Discussion