  XML::Edifact - an approach towards XXMMLL//EEDDII as a prototype in perl
  release 0.32 - normalisation, namespaces, xml2edi
  Michael Koehne, ( kraehe@bakunin.north.de )
  v0.32 release

  XML::Edifact is a set of perl scripts, hopefully becoming a module,
  for translating EDIFACT into XML. This 0.32 version contains a docu-
  ment type definition for the produced XML.  Its intended as a working
  horse, and I hope that some diesel or expat, will be able to translate
  my EdiCooked to XML/EDI and vice versa, once we have a standard.
  ______________________________________________________________________

  Table of Contents:

  1.      Introduction

  2.      Release Notes:

  2.1.    About the beauty of plain text

  2.2.    Its a hard work to cook a second version.

  2.3.    About normalisation, namespaces and xml2edi

  3.      Installation

  4.      Known Bugs

  4.1.    Double namespace declarations

  4.2.    Stating level in Syntax identifier.

  4.3.    XML::Edifact is slow!

  5.      Roadmap

  6.      Legal stuff

  7.      Download
  ______________________________________________________________________

  11..  IInnttrroodduuccttiioonn

  EEDDIIFFAACCTT often called " nightmare of paper less office " once you show
  a programmer the standard draft. Those 2700 pages of horror full
  advisory board English has cursed many programmers with headaches.

  EDIFACT is trying the impossible: a single form for the real world.

  Orders, invoices, fright papers, ..., always look different, if they
  come from different companies. EDIFACT tries to fulfill all needs of
  commercial messages regardless of branch and origin. Of course those
  99% real world is neither simple nor complete.  Nevertheless its
  important for the top companies and their suppliers, you know those
  who can pay a mainframe and a pack of gurus, and in use since 1995.

  XXMMLL//EEDDII is trying to provide a simpler (KISS) format that can be
  translated from and into EDI, to allow smaller companies to avoid
  slaughtering forests and retyping stupid lines into a computer
  keyboard printed by other computers.

  This is NNOOTT XML/EDI, its certainly not KISS. The eeddiiffaacctt0033..ddttdd
  reflects the original words of the EDIFACT standard as close as
  possible on a segment, composite and element level.

  This DTD simplifies EDI in so much as it doesnt distinct between e.g.
  INVOICE or PRICAT but only defines a generic message type called
  edifact:message. The benefit is of course that its possible to convert
  any EDI message into edifact. The drawback is that the dtd is realy
  relaxed. Validation of EDIFACT message design can therefore not be
  done by a validating XML parser. Message designers will still need
  knowledge about EDIFACT message design and EDIFACT tools.

  But once the message is designed its simpler to read it with XML.

  22..  RReelleeaassee NNootteess::

  22..11..  AAbboouutt tthhee bbeeaauuttyy ooff ppllaaiinn tteexxtt

  Standards should be based on standards. EDIFACT is based on ASCII and
  documentation is available from WWW.Premenos.Com as plain text.  Well
  the original contains some PCDOS characters. I took the freedom, to
  replace them with ASCII in this distribution to improve readability.
  I don't talk about human readability here. A friend at SAP joked that
  plain paper is the only platform independent format in that case. But
  I disliked to retype them. And plain text is more flexible, as I'm a
  programmer.

  Unlike the 0.1 distribution, following distributions will only contain
  those documents I need to parse by the scripts. Download the 0.1 for a
  complete set, or surf at Premenos.

  22..22..  IIttss aa hhaarrdd wwoorrkk ttoo ccooookk aa sseeccoonndd vveerrssiioonn..

  As usual. Second versions claim to be better documented and tested,
  but the truth - they contain more features. So talk about features:

  First of all: Its looking like a module. "use strict" and the package
  concept is a usefull thing. But it'll take a lot of RTFM for me to
  understand the perl way of doing it. The XML/Edifact.pm doesnt export
  anything, and its not even neccessary to "perl Makefile.PL; make
  install".

  A 0.2 version is not intendet to become installed, its a test case.

  So talk about the test case: Run ./bin/make_test.sh from here, and
  anything should be fine. Still it need some RTFM for me to understand
  the perl way of regession test. But the ./bin/make_test.sh is the one
  this version offers ,-)

  I'm now using a tied hash for speeding startup. I've deceided to use
  SDBM as this DBM comes with any perl, and a small DBM is better in
  this case.

  I've provided a document type definition. And its now possible to use
  a validating parser like SP from James Clark. You may also notice the
  renaming from Edi2SGML to XML::Edifact. This namechange reflects that
  my script is now producing XML and not SGML, and the name should point
  where in cpan hirachy this package belongs.

  22..33..  AAbboouutt nnoorrmmaalliissaattiioonn,, nnaammeessppaacceess aanndd xxmmll22eeddii

  You may notice the major change in the DBM design. While the old DBM
  files had been modeled closely to the batch directory. This version
  has been partly normalised to improve coding. Its also denormalised
  for some perlish reasons. Unloading of this DBM into a relational
  database would be possible with varchars, but the semantics of the 2nd
  element in segments and composite could only be expressed with some
  wired object relational databases like Postgres.

  Also the DTD changed for namespace reason. The 0.2 need to add the
  word literal, where element names clashed segment names of the
  standard. And it droped the composite informations.  Now
  trsd:party.name means the segment, while tred:party.name points to the
  element.

  This allows to parse the XML message to produce a EDI message without
  an backtracking parser. The event based parser used for xml2edi is
  quite new, and certainly contain some bugs.  Please dig around your
  real life messages, translate them with edi2xml, back with xml2edi and
  compare the original with the double translation. I've tried a robust
  solution, that doesnt croak with codes from the unknown namespace, I
  hope.

  Version 0.30 and 0.31 used edicooked:message as namespace, versions
  0.32 and up will use edifact:message for the main namespace. The
  technical reason is quite simple. The namespace prefix of a message
  does not mean anything. Its only a shorthand for the provided URI in
  the xmlns attribute. So any distinct XML message can claim to be in
  the edifact: namespace, if the URI is distinct. So if other projects
  starts becoming implemented, they can claim to be in the edifact:
  namespace for the same right. Unfortunate other projects seam to be
  pure vaporware in juli 99.

  A last note about change of 0.2 to 0.30. Treat this number as 0.3.0
  translated to perl canon. The 0.3 is not finished, coming versions
  claiming to be any 0.3x will be step stones to what I think the 30%
  XML::Edifact solution should contain.

  33..  IInnssttaallllaattiioonn

  I've included my modified documents, so others can be able to rebuild
  the DBM files. You may need a Unix like system because of newline
  conventions. This current 0.3x version is not intended to become
  "installed", just run everything from this path.

               $ ./bin/make_data.sh

  This will take a while (48 seconds on my Sun 3/60 :-) and you
  hopefully have a working database. Any ffoooo..eexxtt cchhaannggeedd messages are a
  bad thing, and is probately based on a failure with packing/unpacking
  of this distribution.

  You can now test XML::Edifact it with:

               $ perl bin/edi2xml.pl examples/nad_buyer.edi

  You can try other example files, and if you have own EDI files try
  them: I really want to know how your EDI messages look like, if they
  break anything, what about your code list extension, ...

  Testing different real examples should show some bugs, I hav'nt
  thought about.  Think about the OO''RReeiillllyy invoice or the DDuubbbbeell::TTeesstt
  and you should catch the clue. I've tried to implement the UUNNAA right,
  but this may need some additional debugging.  Take a look at the
  difference between the edi.tst files from Frankfurt and the Springer
  message. The last one is using newline as a 9th character in UNA, so
  its nearly human readable.

  To run a complete test, type

               $ perl bin/make_spool.sh

  This will transform my EDI examples into XML and place them together
  with a DTD in the ./spool directory. You already have those files,
  they are compared with their counterparts in the ./example directory.
  If you dont see any ffoooo..bbaarr cchhaannggeedd message, anything went fine.

  Lets talk about the perl way of installation and regression test in
  the 0.4 version.

  44..  KKnnoowwnn BBuuggss

  44..11..  DDoouubbllee nnaammeessppaaccee ddeeccllaarraattiioonnss

  Namespace declaration was redefined in January 1999. XML::Edifact 0.30
  produced bbootthh the old and the new declarations. XML::Edifact 0.31
  droped the depreciated declartions! If you have an old browser, you
  may have to download XML::Edifact 0.30 and to edit the actual
  XML::Edifact.  Search for HHEERREE__ and adopt the headers to your browsers
  preferences.

  44..22..  SSttaattiinngg lleevveell iinn SSyynnttaaxx iiddeennttiiffiieerr..

  This has to be parsed. The stating level in EDIFACT speak is called
  charset encoding in XML speak, and its of course important if you
  thing about non US/UK products. See un_edifact/unsl.

  44..33..  XXMMLL::::EEddiiffaacctt iiss ssllooww!!

  The example real life message teleord.edi needs about 7 minutes on a
  Sun3/60 running NetBSD. Even as newer computers are faster,
  XML::Edifact would nnoott be able to handle the daily batches of large
  UN/EDIFACT routers like TeleOdering UK. The solution of this problem
  will become delayed till version 1.2, when parts of the module will be
  recoded in C.

  55..  RRooaaddmmaapp

  I'm using even and odd numbering to distinct from stable and
  experimental version. Well this 0.2 was not as stable as an even
  number suggests. And I hope this 0.3x is stable enough as as often a
  third version, will be the first usefull one.

  Be warned: Anythink here is pure vaporware. I'm writing XML::Edifact
  in my spare time, and I hope to complete one version per month.

     00..33xx
        This version is under development: It should integrate better
        into the XML::Parser environment, and use some XML::Parser to
        translate XML::Edifact-0.3x messages back into UN/EDIFACT.  Only
        even numbers 0.302468 can be cound on CPAN. Odd versions are
        published by eMail only. As a warning different 0.31 exist.
        Some eMail's I got, caused imediate code changes and a reply to
        test them. If you receive a 0.3-913579 file by eMail: Do not
        distribute it widely, those versions are internal only.

     00..44xx
        This version will focus on portability. While Perl ensures
        portability across the unix'es, MacOS and Win32 will cause some
        problems. The 0.4 version will also be the first one intended to
        become installed. As installation also means configuration of
        non Perlish paths e.g. for webserver, mime.types, mailcap, dtds
        and databases, XML::Config.pm will be discussed in the perlxml
        list.

     00..55xx
        The next important step will be a reverse engineering of the
        document type definition of the original EDI standard draft.
        This version will provide segment groups for defined document
        types like orders and invoices. Most important will be the
        introduction of a XML format for defining code list extensions.
        This format will probately some RDF.

     00..66xx
        Stabilisation by disscussion and consens about the XML DTDs
        introduced with 0.5.

     00..77xx
        EdiCooked is far from being KISS. This release will try on a
        smarter DTD called EdiLean. EdiLean will focus on PRICAT,
        ORDERS, ORDRSP, ORDCHG and INVOICE. If a consens about a KISS
        XML/EDI already exist, EdiLean will try to implement it.

     00..88xx
        Stabilisation by disscussion and consens about the XML DTDs
        introduced with 0.7.

     00..99xx
        Its important for me that authentication and authorisation will
        be provided bbeeffoorree I call it final 1.0. Some Edifact messages
        contain medical informations (MED*), other contain personal
        informations (JOB*). Most messages contain viable information
        for running a bussiness. Only cryptography on a document level
        would preserve authentication and authorisation once a message
        stored on a disk.

        Alf O. Watt ( alfwatt@pacbell.net ) proposed a simple solution
        using namespaces and processing instructions at the perlxml
        mailing list in December 1998. The beauty of this aproach is,
        that the secure document is still wellformed and valid of the
        same document type.

     11..00
        I hope that any consens have been found on that road, so the
        DTDs wont change in further releases. Those versions may focus
        on uussiinngg XML::Edifact in real life applications. I can think
        about an SQL interface, a Cobol interface, a message designer, a
        DOM/CORBA wrapper, and much more.

        Once I think to have XML::Edifact complete, I have to think
        about speed. Perl is a perfect language for prototyping, but
        profiling and using a low level language like C for hot spots,
        will be necessary to handle large batches.

  66..  LLeeggaall ssttuuffff

  Programs provided with this copy called XML-Edifact-0.32.tgz can be
  used, distributed and modified under terms of the GNU General Public
  License.

  Files in the ./examples directory are from varios sources and free of
  claims as far as I know.
  Files inside the ./un_edifact_d96b directory are based on EDI batch
  directory and are therefore copyrighted by the United Nations.  See
  un_edifact_d96b/LICENAGR.TXT.

  Files that are produced during the bootstrap process and placed in
  ./data are based on the original UN/EDIFACT standard and therefore not
  covered by GPL, but likely be covered by the UN.

  Besides the GPLed Edition a Custom Edition, exist if you dislike GPL.
  Drop me an eMail and ask for price and conditions. You can also hire
  me as a consultant within Europe, if you think that the author of a
  tool will probately the best one for teaching your programmers.

  77..  DDoowwnnllooaadd

  I just got a message from PAUSE that I can upload it to :

               $CPAN/authors/id/K/KR/KRAEHE

  So you can find the actual version with :

               $ perl -MCPAN -e shell
               cpan> m XML::Edifact

  or directly at:

  ftp.cpan.org:/pub/perl/CPAN/modules/by-module/XML/XML-Parser.*.tar.gz
  ftp.cpan.org:/pub/perl/CPAN/modules/by-module/XML/XML-Edifact.*.tgz

  You may also get it from my homepage. Try something like:

               http://human.isb.net/~kraehe/pub/XML-Edifact-?.??.tgz

  Be warned its a about 300 kilobyte, as it includes some of the
  Premenos files also. The main script is only about 400 lines, so ffiirrsstt
  ddoonntt ppaanniikk.

