boo2pdf

About

This is the home of boo2pdf, an IBM BookManager to PDF conversion app & web service. I’m currently experimenting with the HTML to PDF backends and would like feedback with book files I haven’t tried. Once the code is cleaned up, I will dump it on my site.  You can find the web service at http://ps-2.kev009.com:8081/boo2pdf/

Motivation

I have a large collection of old IBM machines and documentation. I want this documentation indexed by my own search facilities and Google for easy retrieval. PDF is widely read while BookManager requires proprietary software and no search engines I know of parse it.

This will probably be useful to Mainframers as well.

Known Limitations

  • Currently, internal hyperlinks and headings are not parsed, indexed, or otherwise handled.
  • The Linux SoftCopy Reader does not convert some of the older embedded image formats. Possible formats are: GIF, PNG. JPG, MET, GDF, WMF. I’m guessing it is one of the later that does not have a Linux filter. You will know an image did not convert by red text indicating such in your PDF. I’ve seen this in a few .boo files from the early to mid ’90s.

Technical Details

I am using the JAR files from IBM SoftCopy Reader for Linux. I’ve decompiled these and written my own main class and and a wrapper script to take care of setting the LD_LIBRARY_PATH, Java classpath, and other such glue code. I use SoftCopy Reader’s API to output HTML and images from the BookManager files. I then pass this to htmldoc for PDF conversion.

Code

boo2pdf Gitweb

9 thoughts on “boo2pdf

  1. @Martin,

    I don’t. I’ve asked IBM for it in the past so I could do a proper filter but they didn’t seem interested in sharing.

    Let me know if you find anything.

  2. Hi again,

    one thing is a fact: The text of the book isn’t stored as pure ASCII.
    Maybe compression is used. Looked at it and, …, don’t seem to be Deflate.

  3. There seems to be at least a 6-byte header. Bytes 3-6 (inclusive) in most files
    are zero.

  4. @Martin

    These come from the IBM mainframe world so the encoding seems to be EBCDIC. I poked around with a hex editor and saw some strings at least.

  5. I tried debugging the libhlcwam.so dynamic library using IDA Pro 5.5 Advanced.
    Wow, this thing is really big !
    Even with the (buggy) Decompiler it’s hard to crack/hack.

    I currently come only slowly forward due to the bigness.
    I currently working on a text file describing the file format, only the header is partially
    done.

    Have you other ideas on how to come forward faster ???!?!?

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>