Basic Concepts of XML

What is XML?
  • XML stands for eXtensible Mark up Language. It is classified as an extensible language because it allows its users to define their own tags.
  • XML was developed to provide a universal format for describing structured documents and data.
  • There are no fixed tags for XML. Any user can add his own set of tags. The tags though are similar to HTML, they do differ by the way it is presented.
  • Unlike HTML, which tags elements in Web pages for presentation by a browser, e.g. Oracle, XML tags elements as data, e.g. Oracle. In this example HTML identifies as a command to display the data within as Bold. But in case of XML, the company for instance can be a column name in a database and Oracle is the column value.
Why do we use XML?
  • As XML is W3C(World Wide Web Consortium) standard, various software companies have openly accepted and implemented it in their operations.
  • It is a fee-free open standard.
  • It is platform-independent, language-independent, textual data.
  • XML can be used with existing web protocols (such as HTTP and MIME) and mechanisms (such as URL's ), and it does not impose any additional requirements.
  • XML can handle any kind and high volumes of information especially over the internet and WWW.
  • It is Unicode compatible, means it can handle UTF ready languages.
  • It is used as an interface touch-point between majority of applications. XML is replacing the age-old flat file system to send and receive data between applications.
Building blocks of XML

XML documents are made up by the following building blocks:
  • Elements
  • Attributes
  • Entities
  • PCDATA
  • CDATA
What are Elements?

Elements are the main building blocks of XML documents.

XML elements could be "my_body" and "message" in the following example. Elements can contain text, other elements, or be empty.
<my_body>some text</my_body>
<message>some other text</message>

What are Attributes?

Attributes provide extra information about elements.
Attributes are always placed inside the opening tag of an element. Attributes always come in name/value pairs. The following "images" element has additional information about a source file and its name:
Example:
<images location="computer.gif" name="some image name"/>

In the above example, images is called as an Element; whereas location and name are called as Attributes.

What are Entities?

Some characters have a special meaning in XML, like the less than sign (<) that defines the start of an XML tag. The following entities are predefined in XML:
lt;
gt;
amp;
quot;
apos;

Add and & mark before this special character.

What is PCDATA?

PCDATA means parsed character data. Think of character data as the text found between the start tag and the end tag of an XML element.

PCDATA is text that WILL be parsed by a parser. The text will be examined by the parser for entities and markup. Tags inside the text will be treated as markup and entities will be expanded.

However, parsed character data should not contain any &, <, or > characters; these need to be represented by the amp, lt; and gt; entities, respectively.

What is CDATA?

CDATA means character data. CDATA is text that will NOT be parsed by a parser. Tags inside the text will NOT be treated as markup and entities will not be expanded.

Sample XML
<?xml version="1.0"?>
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend</body>
</note>


In the above example XML file the string starting from <!DOCTYPE note [ upto ]> is called as DTD (Document Type Definition).


What is DTD?

DTD (Document Type Definition) is a set of rules or grammar that we define to construct our own XML rules (also called a "vocabulary"). In other words, a DTD provides the rules that define the elements and structure of our new language.

This is comparable to defining table structures in Oracle for a new system. As we define the columns of a table, determine the datatypes of the columns, determine if the column is 'Null' allowed or not, the DTD defines the structure for the XML document.

A DTD can be declared inline inside an XML document (as in the previous slide), or as an external reference(as in the below example).

Example of external DTD:
<?xml version="1.0"?>
<!DOCTYPE note SYSTEM "note.dtd">
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Some data</body>
</note>

The contents of note.dtd file is as below:
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>

Why use a DTD?

With a DTD, each of your XML files can carry a description of its own format.

With a DTD, independent groups of people can agree to use a standard DTD for interchanging data.

Your application can use a standard DTD to verify that the data you receive from the outside world is valid. You can also use a DTD to verify your own data.