Visual editing of semistructured documents on the web

A talk I gave at the International Conference for Educators in Visual Communication at the California Technical University in San Luis Obispo. It was great fun to be able to give a talk on a topic that has been following me for several years.

In this presentation I will discuss some of the possibilities but also the problems of editing structured content in a visual fashion. The goal of it all is to communicate the structure we want/need to the author of the content. We hope to do this by well established visual means like typography, color and the likes. Let’s begin with some personal background history which will clarify the problem we are trying to solve.

In one of my first jobs, I have worked for a series of law publications. Those magazines discussed recent decision of the Swiss Federal Courts to show their implications for future cases. The magazines had existed for some time already and the workflow was print first. Trying to change the workflow so we could publish the articles on the web too proved difficult. Not so much from a technical perspective but rather because the authors didn’t like the tools we asked them to use which were mostly forms.

For my bachelor’s thesis I experienced the same problem all over again. We had these beautiful ideas of re-using content in multiple publications. This was especially intriguing because the client was an internationally active watchmaker. All their content was translated into 30 different languages. So reusing it would have resulted in tremendous savings. But equally, we didn’t have a tool which would allow untrained authors to produce well-structured and therefore reusable content.
For example, we have the step-by-step instructions of the manual. Many of those instructions are being reused and recombined for multiple models. But currently each of those copies is handled like a separate text.

Some editorial content was also reused in several places. The descriptions of the different materials employed in the watches for example. The content was very similar with slight variations from place to place. This led to many unnecessary retranslations of the same material.

As a last example for our case, there are the course description at the University of Applied Sciences where I work. Those descriptions very highly structured and reused in several places. Still, the professors write them in Word because all the alternatives are worse. But are they?

First, let us define more clearly who we actually **target** with our solution. We decided to call them *novice users*. They are indeed novices when it comes to use our editing solution. And they should never need to become experts, that's the whole point. We want to avoid the need for training. At the same time, our targeted users are expert in their domain. Be it lawyers, watch sellers or professors.

The next step is to define the **type of content** we are working with. We decided to call them *documents*. Their particularity is that they can stand on their own and are usually created to be distributed as a whole. At the same time, they contain some structure which allows to compare multiple documents based on certain attributes. We could, for example, compare multiple articles based on their summaries. Or course descriptions based on the number or periods each course lasts. We call this kind of content *semistructured*.

In order to understand how we should edit *semistructured content*, we'll try to understand the nature of semistructured content. It can be located between highly structured content and completely unstructured content. The main property of structured content is, that its structure can be defined beforehand. For example, when creating the schema of a database to store address book entries. On the other end of the spectrum, content and presentation are perfectly blended and create a unique ensemble. Semistructured content fills the spectrum between the extremes but always integrates both ideas in parts.

A document is therefore a mix of three things:

The content itself

A model which defines how the content should be understood

A presentation which defines how everything looks.

It is important to notice that with the same content and the same model, numerous presentations are possible. The length of aa boat can be presented in different ways as seen on the slide.

The role of the presentation layer is to convey the meaning of the content to the reader. Ideally it helps him understand the underlying model and the structure of the content. With this knowledge, the reader is well equipped to unpack the message and understand it in its proper context.

One example for this can be seen here. Even without understanding what the text says, the reader immediately understands that this represents at ticket. This is deduced because of formal conventions. He can even find out how high the entry fee is or at which date the event is going to take place. And that is just because the numbers are laid out in the right way. Be careful, this tool is pretty powerful.