<NetML/>

How many ways are there to describe a computer network? Maybe one...


Disambiguation: NetML is also the name of a network design system developed by prof. Ronald G. Addie et al. If that is the system you are looking for, you may want to consider having a look at the pertaining Google Code page and at a reference paper.
Otherwise, enjoy your stay on this web site.

Home | News | Detailed Information | Download | Configuration Examples | Publications and Contacts

Index | Architecture | About XML | ER Diagrams | ER to XML | Per-vendor Comparison | Firewall Support

Detailed Information - ER to XML Transformation

In order to define an XML based language to describe routing protocol configurations, all the corresponding Entity-Relationship diagrams must be translated into DTDs or Schemas. This page describes a possible method for the Automatic Generation of XML DTDs from Conceptual Database Schemas. Much of the contents of this page is based on the work by Carsten Kleiner and Udo W. Lipeck.

In this method the authors have attempted to limit the amount of information loss when translating to and from an XML DTD starting from an ER schema. At the time the algorithm was written XML Schema was not yet widely used, so the authors chose to translate to an XML DTD even though they knew that they would lose more information than if they used XML Schema.

The algorithm starts by choosing a root element (which is normally the schema name) and requires placing into it, at the highest level, those entities which do not depend on other entities, as well as the n:m- and n-ary relationships. Once these elements have been identified, they are placed in a queue so that they can be visited in the following phase.
The second phase consists in visiting each entity and relationship in the queue. Those attributes and relationships which are transformed into subelements (in the DTD) are pushed at the end of the queue.
The other attributes (simple typed and from 1:1 relationships) are transformed directly into XML attributes. A detailed description is given later on.

The algorithm performs the transformations described below on the following ER schema elements:

The algorithm itself works as described here. Although functional, the algorithm has some restrictions.



Transformations

Strong Entity
Each strong entity type E of the ER schema is translated into an XML element in the DTD with the same name E.

Simple Attributes Each of the simple attributes A(i) of E, which are usually assumed to be of a simple data type in ER schemas, is modeled as an XML attribute A(i) belonging to element E.
Key Attributes Key attributes in the schema should be translated into ID type attributes with the REQUIRED specification, whereas other attributes should be declared as CDATA with either IMPLIED or REQUIRED specification (depending on whether they may be null or not).
The problem that keys are locally (entity-wide) unique whereas IDs are globally (document-wide) unique can be overcome by using the entity name as prefix to the key value in the document instance. Similarly, we can use composite strings for composite keys.
Composite Attributes Each composite attribute A in the ER schema will be translated into a subelement A of E. The subelement A itself can be defined much in the same way as E was defined: simple attributes are modeled as XML attributes whereas composite attributes within A are translated into nested subelements of A. This process terminates when all attributes are simple.
Multivalued attributes Multivalued attributes M in the schema are translated as subelements with cardinality *. The contents of the subelement itself can be constructed like the entity E.
Weak Entity
Each weak entity W of the ER schema with owner entity type E is modeled by a subelement W of the identifying relationship between E and W. This relationship, in turn, is modeled as a subelement of E with cardinality * (or + if each instance of E is guaranteed to have at least one instance of W attached; this can be derived from the participation constraint of E in the identifying relationship). The identifying relationship itself is modeled as regular 1:n relationship. All attributes of W are modeled in the same way as the attributes for strong entities were. Note that it is not necessary to create a composite key of E and W for W as this is implicitly done by using W as a subelement of E. The partial key of W (if existent) is not sufficient as an ID attribute since it is only locally unique but not globally as is required for IDs. Therefore, it may be modeled as required CDATA attribute without enforcing the partial key or by combining the key of E with the partial key to obtain a globally unique identifier.
1:1-relationship
A 1:1 relationship between entity types S and T where both participations are total should be translated by merging the corresponding XML elements S and T into one element. Attributes of the relatiionship are added into the XML element type. If the relationship is partial on one side S and total on the other T, the relationship is modeled by inserting an IDREF attribute into element T with the name of the role of S. All relationship attributes are also inserted into element T in the same way as if they were attributes of T. If the relationship is partial o both sides we may use the subelement approach on either element, but we have to include the subelement with the optional cardinality specification since it is not guaranteed that a relationship between the entities holds.
1:N-relationship
Each regular 1:N relationship R with entity S on the 1 side and entity T on the N side of R can be represented by a subelement R of S with cardinality * or + depending on the participatiion of T in R. If the relationship R carries additionaal attributes, the subelement R should be attached with all attributes of R. Moreover, the element R consists of a single subelement T representing the instance of T participating in R. The case where S and T are the same entity type has to be treated differently. Since all instances of S will already be included, we should not use a subelement of R for the N side, but rather an attribute of type IDREF. This will then point to the instance of T participating in R. The element R itself is defined as an empty element.
N:M-relationship
Binary N:M relatiionships R are mapped to top level elements R in the DTD. The element R itself consists of two attributes which are defined as IDREF. These references will point to a pair of elements of R in the XML document later. By using IDREF, XML documents with minimal redundancy are obtained since each tuple occurs only once and is only referenced at other places. There is one problem. In DTDs the IDREFs are untyped: there is no way to guarantee that in a given XML document the pointers will really point to an element of the desired element type. If an element of a different element type occurs in the document, the XML parser will accept that as well. Consequently, by using IDREFs we lose some constraint information. When using XML Schema these problems are overcome since the construct keyref is available, which can be used to model foreign key constraints. Alternatively (without the use of XML Schema) one could use two subelements of R which are XPointers themselves. In a concrete XML document they would point to an element in the current document of the matching XML element type which has the ID attribute as defined by the current pair to be modeled from relationship R. Still there are integrity problems: a valid XPointer could even point to a completely different element and the documment becomes more complex. Attributes of relationship R are attached to the element R either as attributes or as subelements depending on their complexity.
N-ary-relationship
N-ary relationships of any cardinality are modeled similarly to binary N:M relationships: a top level element is introduced for each relationship. It consists of N IDREF attributes or XPointer subelements for each of the participating entity types. Attributes and remarks are the same as above.
Disjoint Specialization with Total Participation
We use XML elements for both super and subclass entities where the elements of the subclasses contain a subelement representing the superclass. This mapping is possible since every object belongs to exactly one subclass and the superclass is abstract. Only subclass elements are directly included in the DTD. This enforces the abstract superclass specification (which could not be done in the modeling in [4]) and enables reconstruction of the spezialization from the DTD since only in this case the same structured subelement occurs in different elements.
Disjoint, Partial Specialization
We need to include superclass elements in the DTD directly as well since it may be instantiated. Thus, we need elements for all entities and include the superclass element directly in the DTD with an optional subelement to be chosen from all possible subclasses. That is, if we have a superclass G with possible subclasses S1, ..., Sn in the current specialization we include the term <!ELEMENT G (S1|...|SN)?> as description of G. This mapping is similar to [4] and is obviously reversible.
Overlapping Specializations
They can be treated in the same way, independent of them being total or partial. We introduce elements for both the superclass G as well as all subclasses S1, ..., Sn in the specialization. The element of the superclass is directly included in the DTD with optional subelements for any possible subclass, i.e., the term <!ELEMENT G (S1?|...|SN?)> is used. Since multiple subclasses are possible, we need to include the option for multiple subelements. A possible total restriction in the ER schema is not enforced in this mapping, i.e., even though the specialization was defined to be total, a given XML element could include an element of the superclass only. This could be solved by introducing a complicated expression that guarantees that at least one of the possible subelements has to be present. This addition may be useful in certain applications but is not included in our general mapping for simplicity reasons. In [4] this approach was also used but only in the case of two subclasses where it is feasible. Because of the unique way to specify this kind of specialization the mapping is reversible.
Category with Total Participation
It is modeled similarly to a disjoint, total specialization: we have superclasses C1, ..., Cn which are categorized by a subclass S. Therefore, this time we need to introduce elements for all superclasses directly into the DTD. They consist of a mandatory subelement representing the category S. If on the other hand the category is partial we can use the same modeling approach but with an optional subelement S representing the subclass S since not every superclass has to belong to this subclass. The mapping of categories is reversible but in the result they cannot be distinguished from disjoint specializations with the same participation. This is due to the fact that in this approach no semantic information about super or subclasses is included. We think that it is nevertheless acceptable because of the rare use of categories; in case retransformation is required, the semantic information about super and subclasses would have to be added.


The Algorithm

The algorithm works as follows:

  1. Define <SchemaName> to be the root element
  2. Subelements with cardinality * of root element <SchemaName> are:
    • elements for all entities not occuring on the N-side of any 1:N-relationship;
    • elements for all binary N:M-relationships and all K-ary relationships with K > 2;
    • elements for all entities occuring only on N sides and only partially in relationships.
  3. While not all the entity elements used are defined in the DTD, define the next element by (favor entities having identifying relationships to weak entities):
    1. introducing a subelement for each composite attribute;
    2. introducing a subelement with cardinality * or + for each multivalued attribute;
    3. defining a subelement for each relationship with weak entity types (cardinality as in the ER schema);
    4. introducing a subelement for each 1:N-relationship where the current element is on the 1-side with cardinality * or +;
    5. defining XML attributes for all simple-type attributes of the current element (key attributes as ID, others as CDATA);
    6. defining XML attributes for all attributes of 1:1-relationships where the current element participates totally and the other side is partial; define an IDREF attribute for the opposite side of such a relationship; following the same process, if the opposite side entity is a previously seen entity, even if participation is total;
    7. merging the opposite side entity and all relationship attributes into this element for all 1:1-relationships with total participation on both sides, if the opposite side entity is a new entity;
    8. for each relationship element obtained from (c) or (d):
      1. defining a subelement for the opposite side of the relationship, if it is a new entity;
      2. defining an IDREF attribute for the opposite side of the relationship, if it is a previously seen entity;
      3. defining attributes for all relationship attributes as above with entity attributes.
  4. for all relationship elements from N:M- or K-ary relationships with K > 2:
    1. introducing an IDREF attribute for all entities participating in this relationship;
    2. defining attributes for all relationship attributes as above with entity attribute.

This algorithm has various restrictions, mainly caused by the fact that the transformation is done to XML DTDs and not to XML Schema (using it probably would resolve some of the problems). Such restrictions are listed below.



Restrictions

  1. A major restriction is that participation of entities in relationships modeled by IDREF attributes cannot be reconstructed. This is a major restriction since several relationships could not be set up properly in a reconstruction process, e.g., the relationship Works_On can be reconstructed as binary N:M-relationship but in general it cannot be determined between which entities it holds. To overcome this for a particular XML document one may either obtain the entity information from the key attribute of the XML element pointed to by the reference or use special annotations maintaining the name of the entity participating in a relationship. Using a special XML element OppositeEntity with a single attribute EntityName could be used for this purpose. These problems are due to the fact that IDREFs are untyped in XML and therefore consistency cannot be fully enforced even if using annotations. Thus this problem occurs whereever IDREF is used; in XML Schema, as described above, no such references are necessary and therefore the problem disappears.

  2. Also relationship attributes of 1:1-relationships that were included in the entity element of the total participation side cannot be distinguished from regular attributes of that entity without additional annotations (e.g. attribute StartDate of Manages).

  3. Weak entities (e.g. dependent) cannot be reconstructed as weak entities (just as regular entities) since they were modeled in the same way.

  4. Role names from the ER schema were not included in the DTD and can therefore not be obtained from DTD or document.

  5. Composite keys (e.g. combination of attributes name and number as key of entity department) cannot be broken up into their components because they were transformed into a single attribute and there is no way to distinguish them from other attributes.

  6. The modeling support of cardinality constraints in DTDs is veery poor: minimum cardinality (e.g., 4 for WorksFor) may be enforced by including as many subelements in the DTD, the last one with an additional + if no maximum is required; maximum cardinalities can only be expressed by including additional optional subelements up to the desired maximum cardinality. This is very complicated and would not be necessary if using XML Schema (where it would be possibile to use the minOccurs and maxOccurs atttributes to specify cardinality constraints.

  7. All specializations and categories are assumed to be user-defined. That means that no special support for attribute or predicate defined specializations is included. We think nevertheless that these additions would be possible: the discriminating attributes have to be modeled as XML elements with one element for each possible discriminating attribute value. The number of distinct values will probably not be too large since for each one a different subclass is included in the ER schema. The discriminated entities are included as subelements of the elements corresponding to the discriminating attribute. We do not include this mapping into our algorithm to keep it simple and because the benefits gained by this complication are questionable.

All these problems can be overcome by using special annotations in the XML document if reconstruction of the ER schema may be desired, or by using the advanced modeling features of XML Schema.

Altogether, we can say that, even without additional annotations, an important part of the ER schema can be reconstructed from the DTD. If complete retransformation is desired, utilization of additional annotations or vocabulary is required. Because of the flexibility of XML, this is not a major problem even though constraint checking has to be done explicitly with these annotations and cannot be performed implicitly by a validating parser.