XML Info

LINX

Rough Notes

Goals

The LINX prototypes should include the following capabilities that relate to XML:

1)      Create output from the database that can be represented in XML

2)      Create a DTD for representing LINX information – specifically be able to represent the Netscape bookmark information in a similar format (I generalized it slightly)

3)      Create several styles for displaying the LINX XML –

a)       Plain XML

b)       Simple

c)       Table Tree

d)       DHTML

e)       Pumped into script

f)         Java interface

4)      Use several different methods for converting the XML

a)       Utilize XSL

b)       Utilize CSS

c)       Create server-side conversion XMLDOM & transformnode

5)      Fix the samples to work with different browsers

Strictness

XML is typically described as requiring structure and being well-formed (well defined.)   This definition extends beyond the tag structure to also include the content.  Special attention is required in several areas to address flexibility that is demanded by the most common of applications, especially if information is being incorporated from existing web/html applications.

Handling Characters

Character Encoding

The initial declaration of a XML file includes the type of characters within the file.  In the case of LINX several pieces of information were collected directly from HTML pages which have more flexibility (require less stringent definition of character sets).  Since the data was collected from HTML many accented characters where included.  The default encoding of “UTF-8” would not support these characters so an explicit declaration is needed.  The prolog  within the xml file or from the ASP response should be:

<?xml version="1.0" encoding="ISO-8859-1"?>

It may also help to define the same encoding for the XML file using:

Response.Charset = "iso-8859-1"

Ampersand and other funny characters

Unfortunately, using a DTD there isn’t a way to accurately define the type of information that should exist within an “element” beyond using “#PCDATA” (parsed character data).  Any data within this element will still be parsed (examined for the special characters.)   When using data from a database or from the web several problems can be created especially if the data is related to bookmarks such as found in LINX.  Web addresses and therefore bookmarks use the “&” as a delimiter to separate values with the URL.  A few methods could be used to alter the data either before storing into the database or upon withdraw from the database.  Each of these creates a problem since a special routine would be required for every usage of the data.  Instead we can mark the data so that it won’t be parsed using the CDATA nomenclature like the following:

<![CDATA["character data"]]>

Displaying XML

Currently IE5 doesn’t recognize data passed from an ASP file as XML without explicit definition.  The type of content can be described in the header information for the response.  This should be set prior to any general content being exchange with the user for this ASP page.  The following command will set the type:

Response.ContentType = "text/XML"

The DTD

This is the current DTD

<!-- LINX.DTD created by Clark Brady, Strategy1st cbrady@strategy1st.com   -->

<!-- Modified June 26 2000 -->

 

<!ENTITY % item "FOLDER | LINK">

<!ELEMENT LINX (TITLESEC, (%item;)+) >

<!ENTITY % ldates "(CREATED | MODIFIED | VISITED | OTHERDT)*">

<!ELEMENT TITLESEC (TITLE, %ldates;, STATUS?) >

<!ELEMENT FOLDER (TITLESEC, (%item;)*) >

<!ELEMENT LINK (TITLESEC, ADDRESS) >

 

<!-- components of titlesec -->

<!ELEMENT TITLE (#PCDATA) >

<!ELEMENT ADDRESS (#PCDATA) >

<!ELEMENT STATUS ANY>

 

<!-- date types -->

<!ELEMENT CREATED (#PCDATA) >

<!ELEMENT MODIFIED (#PCDATA) >

<!ELEMENT VISITED (#PCDATA) >

<!ELEMENT OTHERDT (#PCDATA) >

<!ATTLIST OTHERDT

       datetype CDATA #REQUIRED>

 

Notes for the DTD:

n        This DTD uses attributes very sparingly.  Only the OTHERDT (other date) has a required field that should define the type of date.

n        STATUS is defined as ANY to provide significant flexibility.  This typically would be used for notes or a longer description.

n        ENTITY is used to group similar ELEMENTS – this allows for improved readability without requiring additional nesting within the XML file.

 

A Simple Stylesheet

A sample XSL style sheet:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">

 

<xsl:template match="/">

<html>

<head>

<title>SIMPLE - <xsl:value-of select="/LINX/TITLESEC/TITLE" /></title>

</head>

<body>

<h1><xsl:value-of select="/LINX/TITLESEC/TITLE" /></h1>

<p><xsl:value-of select="/LINX/TITLESEC/CREATED" /></p>

<xsl:apply-templates select="//LINX" />

</body>

</html>

</xsl:template>

 

<xsl:template match="LINX">

<xsl:apply-templates />

</xsl:template>

 

<xsl:template match="FOLDER">

<h1>Folder - <xsl:value-of select="TITLESEC/TITLE" /></h1>

<xsl:apply-templates />

</xsl:template>

 

<xsl:template match="LINK">

<p>

<a><xsl:attribute name="href"><xsl:value-of select="ADDRESS" /></xsl:attribute>

<xsl:value-of select="TITLESEC/TITLE" /></a>

<xsl:apply-templates />

</p>

</xsl:template>

 

 

</xsl:stylesheet>

 

Variations with IE

Notes for the XSL:

n        The prolog requires specific information for IE (xmlns)

n        Template matching and apply-templates are tricky – several different variations where attempted before this was utilized:

<xsl:template match="/">   {match the root}

blah, blah, blah…

<xsl:apply-templates select="//LINX" />  {match the document type}

blah, blah, blah…

</xsl:template>

n        For the most part tags are placed directly into the XSL where needed.  Special consideration has to be given with an attribute of the tabs need to be set as in the case of the link (“a” tag).  The following will set the href as needed:

<a><xsl:attribute name="href">