Writing (Java) code to read xml is long dead thanks to jaxb which generates all the binding code from an XSD. The next piece to remove is the need to create and maintain XSD’s.
Sometimes known as “schema by example”, at least initials XSD’s can be created on the basis of exemplar xml files.
xml to dtd via dtdgen
DTDGenerator or “dtdgen” was written by Michael Kay and is very easy to use:
% wget http://prdownloads.sourceforge.net/saxon/dtdgen7-0.zip % unzip dtdgen7-0.zip % java -cp dtdgen.jar DTDGenerator Usage: java DTDSAXGen input-file >output-file % java -cp dtdgen.jar DTDGenerator ${JAR_CZAR_DATA}/pub/000/d31/bfbcfe8d6a2dce7f6eb0113d07969ac965/000d31bfbcfe8d6a2dce7f6eb0113d07969ac965-jar-czar.xml > jar_czar.dtdgen.dtd
xml to dtd via DtdJenny
I wrote DtdJenny a while back but it tries to do a similar job in javascript.
Here is it’s DTD for the same input file.
The only really difference other than spacing is that DTDGenerator says:
<!ATTLIST sum type NMTOKEN #REQUIRED >
and DtdJenny says
<!ATTLIST sum type CDATA #REQUIRED >
which is probably a point to DTDGenerator…
Need to fix that…
dtd to xsd
The only tool I ever really found to translate from XML schema language to XML schema language is trang by James Clark.
It is also really easy to use:
% wget http://www.thaiopensource.com/download/trang-20030619.zip % unzip trang-20030619.zip % java -jar trang-20030619/trang.jar fatal: at least two arguments are required Trang version 20030619 usage: java com.thaiopensource.relaxng.translate.Driver [-I rng|rnc|dtd|xml] [-O rng|rnc|dtd|xsd] [-i input-param] [-o output-param] inputFileOrUri ... outputFile % java -jar trang-20030619/trang.jar jar_czar.dtdgen.dtd jar_czar.dtdgen.xsd
How easy is that?
correcting the xsd
There is a reason DTD’s are a technology of yesteryear… It is not nearly descriptive enough to be used seriously.
If we had xml like this:
<example>
<a b="1.0" c="1" d="fat"/>
<a b="2.0" c="2" d="cat"/>
<a b="3.0" c="3" d="hat"/>
</example>
The attributes “b”, “c” and “d” would all be of type “NMTOKEN”!
Which is just dumb!
“Obviously,” “b” is a “float,” “c” is an “integer” and d is a “string”
I realized this lameness near the end of developing DtdJenny as I was working on the type inferencing!
Ack! I had to basicly drop this info on the floor cuz DTD doesn’t support it!
Someday I will try to salvage what I can to create XsdJenny…
For now… you have to fix it by hand! Blecho!
Next trick… The maven 2 jaxb plugin