docbook2X is the collective name for a bunch of tools for converting DocBook documents into the traditional Unix documentation formats: roff -man pages and Texinfo. Currently the formaters are implemented as Perl SGMLSpm spec files.
The latest version of docbook2X is available as a tarball. It also includes example SGML documents for testing. For other examples, please see the documentation at the GGI Project.
Stephane Bortzmeyer has also created Debian packages of docbook2X.
Integrated patch from Thomas Lockhart:
I've added some command line arguments to set a couple of kinds of default section numbers for the man page output.
There are some new global variables to carry the command line defaults and to keep track of section info read from the input file.
Changes by steve: Reduced globals to a minimum for aesthetic reasons.
The output file name generated from the refentry title is lowercased, and has underscores to replace blanks for multi-word titles. Previously, the file name was uppercased if the input string was mixed case, and uppercase titles and blanks were passed through unchanged. Things are still uppercased internally as you intended.
Changes by steve: Preserve case is still the default.
The section "number" for the output file name defaults to one (".1") if the refentrytitle contains an application tag. This default can be changed from the command line.
Changes by steve: I consider this to be a hack. Cleaned up the code for handling that and made it optional.
The default section "number" for the output file name defaults to ell (".l") for no good reason other than that is what I needed. You might want to change it, but it should probably default to *something*.
The date is snarfed from date info in the file, if available. This actually does not do anything useful for me at the moment because the date isn't available early enough in the source file.
Changes by steve: Also added option to pass in specific date to use.
In a couple of places, the script required attributes to be defined which were not in my input files. I put some checks around these places in the script so that it proceeds gracefully, as it already did in other places. Previously, it choked.
More separator kludges. FORMAT=linespecific is NOTATION not CDATA; fixed. A start on implementing Procedures.
Fixed output->man_output bugs, added a few more SDATA.
Put in huge kludges to handle breaking after an 'embedded' block element correctly.
Fixed REFERENCE/BOOK TITLE handling. Fixed oversight where TERMs are always bold. After using our own output, sgml functions it becomes a lot slower. I don't know if that can ever be fixed. Deleted unused code.
Moved out duplicate code to functions, making it cleaner. Ignore content of DocInfo, RefSect1Info and similar elements. No major breakages in my limited practical usage now.
It also now supports XRefs to RefEntries (using SGMLS::Refs, so many cases of CiteRefEntry to other man pages in the same document can be eliminated. Note that if a XRef is a forward reference, then docbook2man needs to be run twice to output the correct text.
Mostly fixed the title quoting problem. Cleaner output w.r.t font changes. Better CmdSynopsis (though still not finished.)
Never mind the big jump. Cleaned up the code, and identified more areas to fix and clean up, esp. w.r.t. push_output('string') handling. It also works now on weird.sgml.
I have added partial support for a few more elements (namely CmdSynopsis, and cleaned up some of the code. However, the code is kind of messy right now, especially w.r.t. 'block' element handling. A bit of rewriting of those parts is in the works.
Below is a sample man page.
sgmlspl {sgmlspl-specs/docbook2man-spec.pl}
nsgmls [sgml
document]
| sgmlspl
{sgmlspl-specs/docbook2man-spec.pl} [--section
label] [--appsection
label] [--defsection label] [--date
date] [--lowercase |
--preserve-case]
docbook2man is a sgmlspl spec file that produced man pages (using the -man macros) from DocBook RefEntry markup.
The program reads ESIS produced by nsgmls (or other SGML parsers) from standard input. Markup not found in RefEntry is discarded.
Its output, the converted man pages, are written to the current directory. If RefMeta information is not specified in a RefEntry, then the man page will be written to standard output.
The file manpage.links will also be created, which contains any aliases of the manpages generated. This file is in the format:
man page alias manpage
The manpage.refs file keeps track of XRef references. Note that if the input document has any forward references, then docbook2man may have to be invoked twice (the first time updating manpage.refs) to resolve them.
If Application is found in a RefEntryTitle, then the section defaults to this value.
The default section for the generated man page, if the source RefMeta does not contain a ManVolNum.
Sets both --appsection and --defsection.
Sets date as the date to use for man pages without Date markup (in DocInfo. If this option is not specified, the default date is the current date.
Note that the date must be given as one argument; i.e. shell quoting may have to be used for dates with spaces.
Man page filenames will be lowercased.
Do not --lowercase. This is the default.
The SGMLSpm package from CPAN. This package includes the sgmlspl script that is also needed. |
Trying docbook2man on non-DocBook or non-conformant SGML results in undefined behavior. :-)
This program is a slow, dodgy Perl script.
This program does not come close to supporting all the possible markup in DocBook, and may produce wrong output in some cases with supported markup.
Obvious stuff:
Fix docbook2man breakages found in the test documents, especially weird.sgml.
Add new element handling and fix existing handling. Be robust.
Produce cleanest, readable man output as possible (unlike some other converters). Follow Linux man(7) convention. As conversion to man pages is usually not done very often, it is better to be slower/more complicated than to produce wrong output. Also if someone wants to give up using DocBook for whatever reason, the last-converted man pages can then be maintained manually.
Make it faster. I think most of the speed problems so far is with parsing ESIS. Rewrite SGMLS.pm with C and/or get input directly from SP.
Support other (human) languages. But what to do with non-ASCII charsets? SGMLSpm doesn't report them and roff does not grok them.
Copyright (C) 1998-1999 Steve Cheng <steve@ggi-project.org>
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version.
You should have received a copy of the GNU General Public License along with this program; see the file COPYING. If not, please write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
Added a few more elements such as BLOCKQUOTE. Ported stuff from docbook2man. First bits of index generation. Misc bugfixes, but Articles still don't work yet.
Put in huge kludges to handle breaking after an 'embedded' block element correctly. Added a few more elements, admonitions. Added some more SDATA.
Fixed TERM handling.
First initial release. Although some parts of the output may be spartan and it does not support a lot of elements, it works in both Texinfo and info with proper nodes and cross referencing. I have only tested with GGI documentation, don't blame me if it doesn't work with any other document, this is only a first release.
Below is a sample man page.
sgmlspl {sgmlspl-specs/docbook2texi-spec.pl}
nsgmls [sgml
document]
| sgmlspl
{sgmlspl-specs/docbook2texi-spec.pl} [basename]
docbook2texi is a sgmlspl spec file that produces GNU Texinfo documents from DocBook documents.
The program reads ESIS produced by nsgmls (or other SGML parsers) from standard input. Currently the document element must be Book, otherwise the results are undefined.
Its output, the converted Texinfo document, is written to standard output.
The file basename.refs will also be created, which contains all the nodes in the document and their immediate 'child' nodes. As node processing always require forward references, docbook2texi must be run twice for each document: the first time to build the references, and the second to actually generate a valid document.
The SGMLSpm package from CPAN. This package includes the sgmlspl script that is also needed. |
Trying docbook2man on non-DocBook or non-conformant SGML results in undefined behavior. :-)
This program is a slow, dodgy Perl script.
This program does not come close to supporting all the possible markup in DocBook, and may produce wrong output in some cases with supported markup.
Come up with a good block element model. I.e. fix nested blocks handling for once and for all. Also applies to docbook2man. This is partially done.
How the hell do you represent a backslash (\) in Texinfo!!@? I've tried \\ but TeX complains about it.
Fix breakages found in the test documents.
Add new element handling and fix existing handling. Be robust.
Make it faster. I think most of the speed problems so far is with parsing ESIS. Rewrite SGMLS.pm with C and/or get input directly from SP.
There are some dependencies on elements occurring when they are actually optional (according to the DTD). We need to fix that (preferably) or prominently state the requirements.
Do something with the *Info (and ArtHeader elements.
Allow other more common document elements.
Separate out node referencing to a separate script. Not only would it make it faster/easier to maintain because it's separate from the main code, but also I would like it to evolve into an automatic DocBook ToC generator.
Make Texinfo output look nicer. Or is Texinfo just plain ugly?
Copyright (C) 1998-1999 Steve Cheng <steve@ggi-project.org>
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version.
You should have received a copy of the GNU General Public License along with this program; see the file COPYING. If not, please write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
Primary reason: I know much more Perl than I do about the other alternatives (and that isn't much).
When I first looked, Python doesn't have any SGML modules, so I did not use it at first. Not that Perl's SGML modules are optimal but they do work. Both languages have comprehensive XML support, but that would require people to use XML or convert from SGML to XML.
DSSSL is geared towards transformation to SGML, which the target formats are not.
For docbook2man, DSSSL and Python would still have the same contorted code to cope with roff idiosyncrasies, more clean in terms of syntax but harder to write for lazy people like me.
I am aware that the current Perl SGMLSpm-based solution is not scaleable and relatively hard to hack/customize. In the future there maybe other solutions.
Simple: it doesn't work. It fails horribly with added whitespace and is another one of those extremely-limited-functionality scripting languages. It's got a tree model and relatively fast but isn't hackable.
I agree, particularly docbook2texi. I'm currently working on a tree-based conversion (using XML and DOM). Basically stylesheets in Perl. It will be robust and allow as much customization as possible, and will be modularized.
Using XML means documents in SGML have to be converted, but cursory tests show that on-the-fly-SGML->XML+expat+DOM, is at about the same speed as SGMLSpm or even faster. However XML and XML tools are easier to use and have more functionality.
I have tried Python this time, and it is a cleaner language, but unfortunately it is way too slow with XML+DOM parsing. So I stick with Perl.