For our example, we'll create a root element called <SalesData> to hold the other elements we will create:
<SalesData>
...other elements go here
</SalesData>
It's also possible that we may want to add some information to our XML document that isn't part of our relational database. This information might be used to indicate transmittal, routing, or behavioral information. For example, we might want to add a source attribute, so that the consuming process can decide which custom handler needs to be run to parse the document being passed. If we choose to add information about the document like this, it makes the most sense to add it as attributes of the root element we create. As we'll see in Chapter 18, many of the emergent XML servers (such as BizTalk) provide just such a mechanism, known as the envelope.
For our example, we'll add an attribute to our root element, to govern what the consuming processor should do with the document when it is received. Specifically, we'll add a Status attribute. This attribute will let the processor know whether the information in the document is new, an update to existing data, or a courtesy copy.
So far then, we have the following structure:
<!ELEMENT SalesData EMPTY>
<!ATTLIST SalesData
Status (NewVersion | UpdatedVersion | CourtesyCopy) #REQUIRED>
Rule 2: Create a Root Element.
Create a root element for the document. Add the root element to our DTD, and declare any attributes of that element that are required to hold additional semantic information (such as routing information). Root element's names should describe
their content.
Model the Tables
Having defined our root element, the next step is to model the tables that we've chosen to include in our XML document. As we saw in the last chapter, tables map directly to elements in XML.
Loosely speaking, these tables should either be:
Content tables, which, for our purposes, simply contain a set of records (for example, all the customer addresses for a certain company).
Lookup tables, which contain a list of ID-description pairs, that are used to further classify information, in a particular row of a table, by storing a description for each ID encountered in a content table. Tables such as ShipMethod in our example are lookup tables.
There is another type of table - a relating table - whose sole purpose is to express a many-to-many relationship between two other tables. For our purposes, we shall model a table like this as a content table.
At this stage we will only be modeling content tables. Lookup tables will actually be modeled as enumerated attributes later in the process.
For each content table that we've chosen to include from our relational database, we will need to create an element in our DTD. Applying this rule to our example, we'll add the <Invoice>, <Customer>, <Part>, <MonthlyTotal>, and other elements to our DTD:
<!ELEMENT SalesData EMPTY>
<!ATTLIST SalesData
Status (NewVersion | UpdatedVersion | CourtesyCopy) #REQUIRED>
<!ELEMENT Invoice EMPTY>
<!ELEMENT Customer EMPTY>
<!ELEMENT Part EMPTY>
<!ELEMENT MonthlyTotal EMPTY>
<!ELEMENT MonthlyCustomerTotal EMPTY>
<!ELEMENT MonthlyPartTotal EMPTY>
<!ELEMENT LineItem EMPTY>
For the moment, we will just add the element definitions to the DTD. We'll come back to ensure that they are reflected in the necessary element content models, (including those of the root element), when we model the relationships between the tables.
Note that we didn't model the ShipMethod table, because it's a lookup table. We'll handle this table
in Rule 6.
Rule 3: Model the Content Tables.
Create an element in the DTD for each content table we have chosen to model. Declare these elements as EMPTY for now.
Model the Nonforeign Key Columns
Using this rule, we'll create attributes on the elements we have already defined to hold the column values from our database. In a DTD, these attributes should appear in the !ATTLIST declaration of the element corresponding to the table in which the column appears.
If a column is a foreign key joining to another table, don't include it in this rule - we'll handle foreign key columns later in the process, when we model the relationships between the elements we have created.
Declare each attribute created this way as having the type CDATA. If the column is defined in your database as not allowing NULL values, then make the corresponding attribute #REQUIRED; otherwise, make the corresponding attribute #IMPLIED.
We have four choices here. #FIXED means the DTD provides the value. #REQUIRED means it must appear in the document. #IMPLIED means that it may or may not appear in the document. Finally, a value with these means that the processor must substitute that value for the attribute if it is not provided in the document. #IMPLIED is the only way to legitimately leave off an attribute value.
If we choose to store table column values as the content of elements, rather than attributes, we can take the same approach - create an element for each data point, and add it to the content list of the element for the table in which the column appears. Use no suffix if the column does not allow nulls; or the optional suffix (?) if the column allows nulls. Be aware that if we take this approach, we'll need to be on the look out for possible name collisions between columns in different tables with the same name. This is not an issue when using attributes.
To summarise:

For our example, remember that we want to keep all the nonforeign key columns, with the exception of the system-generated primary keys:
<!ELEMENT SalesData EMPTY>
<!ATTLIST SalesData
Status (NewVersion | UpdatedVersion | CourtesyCopy) #REQUIRED>
<!ELEMENT Invoice EMPTY>
<!ATTLIST Invoice
InvoiceNumber CDATA #REQUIRED
TrackingNumber CDATA #REQUIRED
OrderDate CDATA #REQUIRED
ShipDate CDATA #REQUIRED>
<!ELEMENT Customer EMPTY>
<!ATTLIST Customer
Name CDATA #REQUIRED
Address CDATA #REQUIRED
City CDATA #REQUIRED
State CDATA #REQUIRED
PostalCode CDATA #REQUIRED>
<!ELEMENT Part EMPTY>
<!ATTLIST Part
PartNumber CDATA #REQUIRED
Name CDATA #REQUIRED
Color CDATA #REQUIRED
Size CDATA #REQUIRED>
<!ELEMENT MonthlyTotal EMPTY>
<!ATTLIST MonthlyTotal
Month CDATA #REQUIRED
Year CDATA #REQUIRED
VolumeShipped CDATA #REQUIRED
PriceShipped CDATA #REQUIRED>
<!ELEMENT MonthlyCustomerTotal EMPTY>
<!ATTLIST MonthlyCustomerTotal
VolumeShipped CDATA #REQUIRED
PriceShipped CDATA #REQUIRED>
<!ELEMENT MonthlyPartTotal EMPTY>
<!ATTLIST MonthlyPartTotal
VolumeShipped CDATA #REQUIRED
PriceShipped CDATA #REQUIRED>
<!ELEMENT LineItem EMPTY>
<!ATTLIST LineItem
Quantity CDATA #REQUIRED
Price CDATA #REQUIRED>
Note that we left off Month and Year on the <MonthlyPartTotal> and <MonthlySummaryTotal> structures, since these will be dictated by the <MonthlyTotal> element associated with these elements.
Rule 4: Modeling Nonforeign Key Columns.
Create an attribute for each column we have chosen to include in our XML document (except foreign key columns). These attributes should appear in the !ATTLIST declaration of the element corresponding to the table in which they appear. Declare each of these attributes as CDATA, and declare it as #IMPLIED or #REQUIRED depending on whether the original column allowed nulls or not.