‘Articles’ Archive

April 1, 2000

A Tale of Two Conversion Houses

(Originally written in 2000 for the now-defunct eBookWeb.)

Once upon a time in eBookspace, there was a conversion house. “Ah! Open eBook Publication Structure!” said the people of this conversion house. “This is merely HTML in disguise.”

And they said, “Let us hire a great many robots at the lowest wages we can manage, build highly sophisticated production tools that even they cannot misuse (since robots, as we all know, are stupid creatures, prone to make mistakes), and turn them loose. We will not train our robots, since training is expensive and they need not understand what they are doing; they need only use the highly sophisticated tools in predictable fashion.

“We will not create high-quality OEB markup, moreover, since we will give our clients only the finished eBooks, not the markup that went into them—and in any case our clients are clueless about quality markup and we will do our best to see that they remain so. And yea, we will make a great deal of money and our days will be long on the earth.”

And they did as they had planned. And they failed miserably, and ended their days in great penury crying woefully unto the heavens about the injustice of their failure.

A long way away from this ill-fated conversion house, there was another conversion house, older and wiser. “Ah! Open eBook Publication Structure!” said the people of this conversion house. “This is more complex than it appears at first glance.”

And they said, “Let us read the Publication Structure carefully and understand it well. Let us place an individual on the Open eBook Forum’s Publication Structure Working Group, to understand the Publication Structure better as well as impact its future development. Let us disseminate our understanding among our clients, that they appreciate the intricacies of what we do for them and our expertise in doing it.”

And at last, they said, “Let us also develop among our rank-and-file employees great expertise in the Publication Structure and its underlying technologies, particularly XML, since without widespread understanding we cannot efficiently produce quality work, no matter how sophisticated our tools. This will mean significant training and hiring expense and, for a while, decreased employee productivity, but in the long run we will do well.”

And they did as they had planned. And the heavens smiled upon them, and they prospered and grew fat.


From respect for libel and slander laws, I wrote the above cautionary tale most carefully so as not to implicate any particular organization. I do have a few names in mind who fit my first profile, however, and those who have watched the rise and fall of factory-house eBook empires in the last couple of years will not find it difficult to divine them. (Don’t email me candidates, please. I won’t confirm a thing. I’m lawyer-phobic.)

Moreover, there are empires soon to fall. I know of a few that still depend on untrained labor, or at best low-paid, barely-competent HTML jockeys. I know of a few that boast of their highly sophisticated tools without letting on that the markup those tools output is utter trash. I know of one house that as a matter of policy refuses to allow its most knowledgeable employees to say anything substantive to customers about markup and conversion issues, so that the customers will remain ignorantly dependent on that house.

They, too, will reap what they have sown.

Why am I so sure of this? Well, recent history bears me out, of course, but my friends and coworkers in eBookspace can tell you that I (somewhat less than discreetly) made predictions on this score well before the failure wave. I’ll share my reasoning, because I think it as useful for those shopping for conversion houses as (I hope) it is for errant conversion houses themselves.

Problem 1: Rigid workflows

Factory-house conversion is dependent on what a former employer of mine used to call “in-dog-out-sausage” methods. Mechanical, rigid, tool-based, divided into steps performed without awareness of the larger process or the end result.

Such workflows work. After a fashion. They do not adapt. They do not handle variation well—and books (e- or p-) are nothing if not variable. Their output lacks complexity, intelligence, appropriateness, reusability, elegance, and sometimes even mere sanity. (I have stories… oh, do I have stories!) None of this augurs well for the long-term viability of the house’s processes.

The weirdest variation on rigid workflows is what I call the “One True DTD” theory of SGML/XML systems. Now, a DTD (“Document Type Definition”) allows you to spell out how to mark up text. It specifies the tags to be employed and how they fit together. Given that, you know about and control what’s in your marked-up documents and can use that knowledge to restructure and reuse the documents as you please.

Gee, some houses have said, wouldn’t it be nice if we could create One True DTD that will suffice for every book in existence? Build a workflow based on it, and we’ll never be hungry again!

Sorry. Such a DTD is a chimera, a Holy Grail, an impossible dream. (Can you imagine the One True Visual Design Spec for all books in existence? Neither can I. A One True DTD is no different.) If such a thing could exist, it would have been invented already, and nobody would bother with SGML or XML.

Those who try this route, too, usually assume that such a DTD can be developed once and never touched again. They even hire external consultants to develop the DTD rather than creating or hiring in-house expertise, a practice whose stupidity and short-sightedness simply leave me agog.

In practice, they quickly find that they have books that don’t fit their DTD, and insufficient knowledge to update the DTD to match. And even if they can update the DTD, the tools based on it have been rigidly tied to the old DTD and can’t easily be updated themselves. Such a system cannot function for long, not if it is thought of as an eternal, unchangeable, inviolate “solution.”

Trained employees who understand DTDs, however, can and do learn to work with unfamiliar DTDs and markup quickly. They can easily adapt their markup and production techniques to novel situations. They’ll never shoehorn a book into an unsuitable markup system; they’ll never need to.

Problem 2: Tools versus training

Factory houses assume that it makes sense to create complicated and expensive tool-monoliths for use by untrained robots who cost next to nothing. Smart conversion houses, on the other hand, hire and train smart employees who understand what they are doing and can use a wide variety of simpler tools and techniques to accomplish their work.

My experience suggests several problems with the tool-monolith approach. First, the developers who create the tools often have flawed (or, worst case, no) understanding of the results the tool is supposed to produce. Nor is there a feedback loop between those who use the tool (assumed to be clueless robots) and those who create and update it. (How many such tools foolishly assume no update cycle at all?) This means that the tool turns out at best flawed, and at worst utterly inadequate. Moreover, once such a monolith is built and the factory house’s workflow depends on it, changing it (even significantly for the better) is next to impossible.

Second, tools become obsolete much faster than knowledge, so the apparent savings do not last forever. A factory house is in trouble once the operating system or binary format on which its tool runs is superseded. It may be in worse trouble if its tools are developed by a third-party vendor that goes out of business, or by an in-house guru who gets a better job offer. On the other hand, trained, flexible, and curious conversion employees, properly tended, build on their experience and their knowledge and, collectively, trump obsolescence entirely.

Third, factory houses design their tools to be all-purpose, cover-everything “solutions.” They contain everything but the kitchen sink, wrapped in a funky GUI. (Seen Beyond Press’s configuration screens lately? Terrifying.) This makes them complex, fragile, and amazingly difficult to configure and use correctly—which utterly defeats the purpose of using tools in the first place! Often, this problem is the result of underestimating the complexity involved in the task at hand while determining how the tool will work; this leads inevitably to massive bloat and feature creep in the tool.

Murphy’s Law states, also, that the tool once created will lack one small feature or make one small error that turns out to be obnoxiously crucial. New tools will have to be created to cover the deficiency, since robots cannot be trusted to fix the problem.

Finally, the more sophisticated and all-encompassing the tool, the more it threatens to delay or even halt entirely the accumulation of expertise by employees. Why learn what’s being done or why if the tool does it for you? Should the tool fail in some way, of course, employees dependent on it are left utterly helpless, and workflows grind to a halt. (I have stories…)

Problem 3: Counting on client cluelessness

Not a few publishers, overeager to climb on the eBook bandwagon, signed big contracts with factory houses. Not a few of them are kicking themselves now, of course, as the starry promises failed to materialize and the unrealistic costs initially quoted inevitably rose. These publishers poured a lot of revenue into factory houses’ pockets, often for very little return.

They’ve learned. They learn more every day. The more they learn, the less impressed they are with factory houses’ inflated claims and shoddy work. Counting on publishers to remain ignorant is short-sighted, suicidal business policy.

Disgust with lousy work and uncommunicative vendors is not the only fallout, however. As publishers get wise, they start wanting to influence the workflow and its result. This is natural and healthy. For factory houses with rigidly-defined workflows, however, it is murder; every single customer represents a new workflow deviation, and the carefully-developed tools prove increasingly incapable of adapting to the deviations. Trained and knowledgeable employees would be able to adapt, of course, but these houses do not train employees.

Problem 4: A vagrant, shifting workforce

Hire a robot. (Foreign-made robots are cheaper.) Don’t train it, except in the bare minimum of rigid, regimented manipulation of software tools. Pay it poorly. Cram it into a cubicle farm, the uglier and more cavernous the better. Divide the workflow into segments so small as to be meaningless, and do not explain to the robot how its individual segment relates to any other, much less how the whole product looks and functions. (I suspect—all right, I know—that not a few trainers did not and do not understand the “whole product,” so could not explain it if they wanted to.)

What will happen? For heaven’s sake, what do you expect? Morale will be terrible, and pep rallies will not improve it. (I do notice that houses pursuing this strategy seem to throw a lot of pep rallies—holiday parties, once-a-month foodfests, and the like. Must get expensive.) Employee productivity will never rise above a certain (fairly low) level. Those few employees with initiative and curiosity will learn all they can and then leave as quickly as possible for more pleasant and better-paid situations, meaning that such expertise as the house manages to develop will fly right out the door. Of those who remain, not a few will be deadwood, untrainable, slow to adapt.

Sure, it’s cheaper short-term. Long-term, it’s not sustainable.

Problem 5: Increased OEBPS sophistication

“But some of them have gotten away with it!” They have, for now. Unfortunately, they’ve got a surprise coming sometime next year. The Open eBook Publication Structure isn’t sitting still. It can’t and shouldn’t. It’s much too limited; there are too many types of content it can’t handle.

I know what’s coming; I’m proud of my role in designing it. HTML looms much less large in the new Publication Structure. There are new features in it that will make eBooks sing—but such features have, without exception, nothing whatever to do with HTML. Not a few of the new features will prove resistant to mechanized application, no matter how whiz-bang the tools. Not a few of them will require fairly sophisticated human judgment to use.

Guess what factory-style conversion houses have completely left out of the equation? That’s right. Human judgment. In the mistaken belief that HTML was all they needed to produce, they reduced eBook production to a rigid series of discrete steps. When the more flexible, more extensible, livelier OEBPS 2 comes out, their painstakingly-developed systems will topple, and their robots will not have sufficient knowledge to step up to the new challenges.

Is there hope?

I will close by mentioning one organization that I believe to be a brilliant example of the smart conversion house in my fairy story. Surf over to Data Conversion Labs’s website; I promise you will be well-rewarded for your pains. Knowledge and information simply overflow from this site, free for the taking; a far cry from the hype-laden, contentless websites of not a few of DCL’s competitors. While you’re there, peek at employment information on “editorial” positions at DCL—you will find their editorial ladder, and the training system and plant-wide knowledge base that must underlie it, a far cry from untrained HTML jockeys.

(Disclaimer: I don’t work for DCL, never have; I’ve never even worked with them except insofar as they participate on the Open eBook Forum. I simply respect them from afar, as I have for a long time.)

Data Conversion Labs proves that doing conversion right is not only possible, but profitable. They will outlive the factory houses. They will prosper, and grow fat.