Archive for September 13, 2008

generating schemas from sample xml

Writing (Java) code to read xml is long dead thanks to jaxb which generates all the binding code from an XSD. The next piece to remove is the need to create and maintain XSD’s.

Sometimes known as “schema by example”, at least initials XSD’s can be created on the basis of exemplar xml files.

xml to dtd via dtdgen

DTDGenerator or “dtdgen” was written by Michael Kay and is very easy to use:

% wget http://prdownloads.sourceforge.net/saxon/dtdgen7-0.zip
% unzip dtdgen7-0.zip
% java -cp dtdgen.jar DTDGenerator
Usage: java DTDSAXGen input-file >output-file
% java -cp dtdgen.jar DTDGenerator ${JAR_CZAR_DATA}/pub/000/d31/bfbcfe8d6a2dce7f6eb0113d07969ac965/000d31bfbcfe8d6a2dce7f6eb0113d07969ac965-jar-czar.xml > jar_czar.dtdgen.dtd

xml to dtd via DtdJenny

I wrote DtdJenny a while back but it tries to do a similar job in javascript.

Here is it’s DTD for the same input file.

The only really difference other than spacing is that DTDGenerator says:

<!ATTLIST sum type NMTOKEN #REQUIRED >

and DtdJenny says

<!ATTLIST sum type CDATA #REQUIRED >

which is probably a point to DTDGenerator

Need to fix that…

dtd to xsd

The only tool I ever really found to translate from XML schema language to XML schema language is trang by James Clark.

It is also really easy to use:

% wget http://www.thaiopensource.com/download/trang-20030619.zip
% unzip trang-20030619.zip
% java -jar trang-20030619/trang.jar
fatal: at least two arguments are required
Trang version 20030619
usage: java com.thaiopensource.relaxng.translate.Driver [-I rng|rnc|dtd|xml] [-O rng|rnc|dtd|xsd] [-i input-param] [-o output-param] inputFileOrUri ... outputFile
% java -jar trang-20030619/trang.jar jar_czar.dtdgen.dtd  jar_czar.dtdgen.xsd

How easy is that?

correcting the xsd

There is a reason DTD’s are a technology of yesteryear… It is not nearly descriptive enough to be used seriously.

If we had xml like this:

<example>
    <a b="1.0" c="1" d="fat"/>
    <a b="2.0" c="2" d="cat"/>
    <a b="3.0" c="3" d="hat"/>
</example>

The attributes “b”, “c” and “d” would all be of type “NMTOKEN”!

Which is just dumb!

“Obviously,” “b” is a “float,” “c” is an “integer” and d is a “string”

I realized this lameness near the end of developing DtdJenny as I was working on the type inferencing!

Ack! I had to basicly drop this info on the floor cuz DTD doesn’t support it!

Someday I will try to salvage what I can to create XsdJenny…

For now… you have to fix it by hand! Blecho!

Next trick… The maven 2 jaxb plugin

Leave a Comment