How many ways are there to describe a computer network? Maybe one...
Home | News | Detailed Information | Download | Configuration Examples | Publications and Contacts
Index | Architecture | About XML | ER Diagrams | ER to XML | Per-vendor Comparison | Firewall Support
In order to define an XML based language to describe routing protocol configurations, all the corresponding Entity-Relationship diagrams must be translated into DTDs or Schemas. This page describes a possible method for the Automatic Generation of XML DTDs from Conceptual Database Schemas. Much of the contents of this page is based on the work by Carsten Kleiner and Udo W. Lipeck.
In this method the authors have attempted to limit the amount of information loss when translating to and from an XML DTD starting from an ER schema. At the time the algorithm was written XML Schema was not yet widely used, so the authors chose to translate to an XML DTD even though they knew that they would lose more information than if they used XML Schema.
The algorithm starts by choosing a root element (which is normally the
schema name) and requires placing into it, at the highest level, those entities which do not depend on other
entities, as well as the n:m- and n-ary relationships. Once these elements have been identified, they are
placed in a queue so that they can be visited in the following phase.
The second phase consists in visiting each entity and relationship in the queue. Those attributes and
relationships which are transformed into subelements (in the DTD) are pushed at the end of the queue.
The other attributes (simple typed and from 1:1 relationships) are transformed directly into XML attributes.
A detailed description is given later on.
The algorithm performs the transformations described below on the following ER schema elements:
The algorithm itself works as described here. Although functional, the algorithm has some restrictions.
Strong Entity | Each strong entity type E of the ER schema is translated into an XML element in the
DTD with the same name E.
|
||||||||
---|---|---|---|---|---|---|---|---|---|
Weak Entity | Each weak entity W of the ER schema with owner entity type E is modeled by a subelement W of the identifying relationship between E and W. This relationship, in turn, is modeled as a subelement of E with cardinality * (or + if each instance of E is guaranteed to have at least one instance of W attached; this can be derived from the participation constraint of E in the identifying relationship). The identifying relationship itself is modeled as regular 1:n relationship. All attributes of W are modeled in the same way as the attributes for strong entities were. Note that it is not necessary to create a composite key of E and W for W as this is implicitly done by using W as a subelement of E. The partial key of W (if existent) is not sufficient as an ID attribute since it is only locally unique but not globally as is required for IDs. Therefore, it may be modeled as required CDATA attribute without enforcing the partial key or by combining the key of E with the partial key to obtain a globally unique identifier. | ||||||||
1:1-relationship | A 1:1 relationship between entity types S and T where both participations are total should be translated by merging the corresponding XML elements S and T into one element. Attributes of the relatiionship are added into the XML element type. If the relationship is partial on one side S and total on the other T, the relationship is modeled by inserting an IDREF attribute into element T with the name of the role of S. All relationship attributes are also inserted into element T in the same way as if they were attributes of T. If the relationship is partial o both sides we may use the subelement approach on either element, but we have to include the subelement with the optional cardinality specification since it is not guaranteed that a relationship between the entities holds. | ||||||||
1:N-relationship | Each regular 1:N relationship R with entity S on the 1 side and entity T on the N side of R can be represented by a subelement R of S with cardinality * or + depending on the participatiion of T in R. If the relationship R carries additionaal attributes, the subelement R should be attached with all attributes of R. Moreover, the element R consists of a single subelement T representing the instance of T participating in R. The case where S and T are the same entity type has to be treated differently. Since all instances of S will already be included, we should not use a subelement of R for the N side, but rather an attribute of type IDREF. This will then point to the instance of T participating in R. The element R itself is defined as an empty element. | ||||||||
N:M-relationship | Binary N:M relatiionships R are mapped to top level elements R in the DTD. The element R itself consists of two attributes which are defined as IDREF. These references will point to a pair of elements of R in the XML document later. By using IDREF, XML documents with minimal redundancy are obtained since each tuple occurs only once and is only referenced at other places. There is one problem. In DTDs the IDREFs are untyped: there is no way to guarantee that in a given XML document the pointers will really point to an element of the desired element type. If an element of a different element type occurs in the document, the XML parser will accept that as well. Consequently, by using IDREFs we lose some constraint information. When using XML Schema these problems are overcome since the construct keyref is available, which can be used to model foreign key constraints. Alternatively (without the use of XML Schema) one could use two subelements of R which are XPointers themselves. In a concrete XML document they would point to an element in the current document of the matching XML element type which has the ID attribute as defined by the current pair to be modeled from relationship R. Still there are integrity problems: a valid XPointer could even point to a completely different element and the documment becomes more complex. Attributes of relationship R are attached to the element R either as attributes or as subelements depending on their complexity. | ||||||||
N-ary-relationship | N-ary relationships of any cardinality are modeled similarly to binary N:M relationships: a top level element is introduced for each relationship. It consists of N IDREF attributes or XPointer subelements for each of the participating entity types. Attributes and remarks are the same as above. | ||||||||
Disjoint Specialization with Total Participation | We use XML elements for both super and subclass entities where the elements of the subclasses contain a subelement representing the superclass. This mapping is possible since every object belongs to exactly one subclass and the superclass is abstract. Only subclass elements are directly included in the DTD. This enforces the abstract superclass specification (which could not be done in the modeling in [4]) and enables reconstruction of the spezialization from the DTD since only in this case the same structured subelement occurs in different elements. | ||||||||
Disjoint, Partial Specialization | We need to include superclass elements in the DTD directly as well since
it may be instantiated. Thus, we need elements for all entities and include the superclass
element directly in the DTD with an optional subelement to be chosen from all possible
subclasses. That is, if we have a superclass G with possible subclasses S1, ...,
Sn in the current specialization we include the term
<!ELEMENT G (S1|...|SN)?> as description of G.
This mapping is similar to [4] and is obviously reversible. |
||||||||
Overlapping Specializations | They can be treated in the same way, independent of them being total
or partial. We introduce elements for both the superclass G as well as all subclasses S1,
..., Sn in the specialization. The element of the superclass is directly
included in the DTD with optional subelements for any possible subclass, i.e., the term
<!ELEMENT G (S1?|...|SN?)> is used.
Since multiple subclasses are possible, we need to include the option for multiple
subelements. A possible total restriction in the ER schema is not enforced in this
mapping, i.e., even though the specialization was defined to be total, a given XML
element could include an element of the superclass only. This could be solved by
introducing a complicated expression that guarantees that at least one of the possible
subelements has to be present. This addition may be useful in certain applications but
is not included in our general mapping for simplicity reasons. In [4] this
approach was also used but only in the case of two subclasses where it is feasible.
Because of the unique way to specify this kind of specialization the mapping is reversible. |
||||||||
Category with Total Participation | It is modeled similarly to a disjoint, total specialization: we have superclasses C1, ..., Cn which are categorized by a subclass S. Therefore, this time we need to introduce elements for all superclasses directly into the DTD. They consist of a mandatory subelement representing the category S. If on the other hand the category is partial we can use the same modeling approach but with an optional subelement S representing the subclass S since not every superclass has to belong to this subclass. The mapping of categories is reversible but in the result they cannot be distinguished from disjoint specializations with the same participation. This is due to the fact that in this approach no semantic information about super or subclasses is included. We think that it is nevertheless acceptable because of the rare use of categories; in case retransformation is required, the semantic information about super and subclasses would have to be added. |
The algorithm works as follows:
This algorithm has various restrictions, mainly caused by the fact that the transformation is done to XML DTDs and not to XML Schema (using it probably would resolve some of the problems). Such restrictions are listed below.
All these problems can be overcome by using special annotations in the XML document if reconstruction of the ER schema may be desired, or by using the advanced modeling features of XML Schema.
Altogether, we can say that, even without additional annotations, an important part of the ER schema can be reconstructed from the DTD. If complete retransformation is desired, utilization of additional annotations or vocabulary is required. Because of the flexibility of XML, this is not a major problem even though constraint checking has to be done explicitly with these annotations and cannot be performed implicitly by a validating parser.