generating schemas from sample xml

Writing (Java) code to read xml is long dead thanks to jaxb which generates all the binding code from an XSD. The next piece to remove is the need to create and maintain XSD’s.

Sometimes known as “schema by example”, at least initials XSD’s can be created on the basis of exemplar xml files.

xml to dtd via dtdgen

DTDGenerator or “dtdgen” was written by Michael Kay and is very easy to use:

% wget http://prdownloads.sourceforge.net/saxon/dtdgen7-0.zip
% unzip dtdgen7-0.zip
% java -cp dtdgen.jar DTDGenerator
Usage: java DTDSAXGen input-file >output-file
% java -cp dtdgen.jar DTDGenerator ${JAR_CZAR_DATA}/pub/000/d31/bfbcfe8d6a2dce7f6eb0113d07969ac965/000d31bfbcfe8d6a2dce7f6eb0113d07969ac965-jar-czar.xml > jar_czar.dtdgen.dtd

xml to dtd via DtdJenny

I wrote DtdJenny a while back but it tries to do a similar job in javascript.

Here is it’s DTD for the same input file.

The only really difference other than spacing is that DTDGenerator says:

<!ATTLIST sum type NMTOKEN #REQUIRED >

and DtdJenny says

<!ATTLIST sum type CDATA #REQUIRED >

which is probably a point to DTDGenerator

Need to fix that…

dtd to xsd

The only tool I ever really found to translate from XML schema language to XML schema language is trang by James Clark.

It is also really easy to use:

% wget http://www.thaiopensource.com/download/trang-20030619.zip
% unzip trang-20030619.zip
% java -jar trang-20030619/trang.jar 
fatal: at least two arguments are required
Trang version 20030619
usage: java com.thaiopensource.relaxng.translate.Driver [-I rng|rnc|dtd|xml] [-O rng|rnc|dtd|xsd] [-i input-param] [-o output-param] inputFileOrUri ... outputFile
% java -jar trang-20030619/trang.jar jar_czar.dtdgen.dtd  jar_czar.dtdgen.xsd

How easy is that?

correcting the xsd

There is a reason DTD’s are a technology of yesteryear… It is not nearly descriptive enough to be used seriously.

If we had xml like this:

<example>
    <a b="1.0" c="1" d="fat"/>
    <a b="2.0" c="2" d="cat"/>
    <a b="3.0" c="3" d="hat"/>
</example>

The attributes “b”, “c” and “d” would all be of type “NMTOKEN”!

Which is just dumb!

“Obviously,” “b” is a “float,” “c” is an “integer” and d is a “string”

I realized this lameness near the end of developing DtdJenny as I was working on the type inferencing!

Ack! I had to basicly drop this info on the floor cuz DTD doesn’t support it!

Someday I will try to salvage what I can to create XsdJenny…

For now… you have to fix it by hand! Blecho!

Next trick… The maven 2 jaxb plugin

Advertisements

2 Comments »

  1. Stuart Cox said

    Thanks for the example of using DTDGenerator. Following your example I was able to duplicate what you’d written and produce a DTD from my .xml file. Much appreciated. Everybody else seems to think that you use java and have its toolchain all set up ready to roll. Thanks for the simple how-to that worked.

  2. brianin3d said

    I’m glad you were able to get some use out of it… Sometimes a lot of these things have sorted rotted to the point where it can be some effort to make them work again.

    For me, being able to create your own custom tool chain is like being able to make your own light saber: it’s a real way to take your skills to another level.

RSS feed for comments on this post · TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: