About
This is the home of boo2pdf, an IBM BookManager to PDF conversion app & web service. I’m currently experimenting with the HTML to PDF backends and would like feedback with book files I haven’t tried. Once the code is cleaned up, I will dump it on my site. You can find the web service at http://ps-2.kev009.com:8081/boo2pdf/
Motivation
I have a large collection of old IBM machines and documentation. I want this documentation indexed by my own search facilities and Google for easy retrieval. PDF is widely read while BookManager requires proprietary software and no search engines I know of parse it.
This will probably be useful to Mainframers as well.
Known Limitations
- Currently, internal hyperlinks and headings are not parsed, indexed, or otherwise handled.
- The Linux SoftCopy Reader does not convert some of the older embedded image formats. Possible formats are: GIF, PNG. JPG, MET, GDF, WMF. I’m guessing it is one of the later that does not have a Linux filter. You will know an image did not convert by red text indicating such in your PDF. I’ve seen this in a few .boo files from the early to mid ’90s.
Technical Details
I am using the JAR files from IBM SoftCopy Reader for Linux. I’ve decompiled these and written my own main class and and a wrapper script to take care of setting the LD_LIBRARY_PATH, Java classpath, and other such glue code. I use SoftCopy Reader’s API to output HTML and images from the BookManager files. I then pass this to htmldoc for PDF conversion.

Hi,
do you have some informations about the File Format of .boo files ?
@Martin,
I don’t. I’ve asked IBM for it in the past so I could do a proper filter but they didn’t seem interested in sharing.
Let me know if you find anything.
Hi again,
one thing is a fact: The text of the book isn’t stored as pure ASCII.
Maybe compression is used. Looked at it and, …, don’t seem to be Deflate.
There seems to be at least a 6-byte header. Bytes 3-6 (inclusive) in most files
are zero.
I mean a 7-byte header, sorry.
@Martin
These come from the IBM mainframe world so the encoding seems to be EBCDIC. I poked around with a hex editor and saw some strings at least.
Yes, i tried the same, and you’re completely right !!
I tried debugging the libhlcwam.so dynamic library using IDA Pro 5.5 Advanced.
Wow, this thing is really big !
Even with the (buggy) Decompiler it’s hard to crack/hack.
I currently come only slowly forward due to the bigness.
I currently working on a text file describing the file format, only the header is partially
done.
Have you other ideas on how to come forward faster ???!?!?
Here’s a link on how to build online BOOks: http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/Shelves/EZ2ZO10I