Office 2003 XML
Review from Bill Coan
Alternative review from Cindy Meister
- by Evan Lenz, Mary McRae, Simon St. Laurent
- Published by O’Reilly http://www.oreilly.com/
- ISBN: 0596005385 567 pp.
- Published June 2004 Price: $US39.95
Office 2003 XML is an ambitious book with much to recommend it, but readers who are new to XML or to Office will find it hard to follow. Indeed, the authors themselves state in their introduction, “If you have never used Microsoft Office or XML before, you may want to consider exploring those technologies in greater depth before reading this book.”
The book includes 60 pages of introductory material in a series of appendices, freeing the main text to deal with deeper subjects right from the beginning. The appendices are well conceived and carefully composed. Most readers will want at least to skim through them prior to wading into the main text.
In addition to basic XML concepts, the appendices discuss Extensible Stylesheet Language Transforms (for sorting, selecting, and formatting XML data), and XML Schema Definitions (for limiting the types of content considered valid in a given XML document). Document Type Definitions (DTDs) are also discussed but will be of interest mostly to readers concerned with legacy XML documents, since Office 2003 doesn’t support DTDs.
The book devotes four chapters to Word while devoting only two chapters to Excel and one chapter each to Access and InfoPath. A separate chapter discusses using web services in Excel, Access, and Word. This allocation seems appropriate given the level and complexity of XML support found in each application.
The first two chapters on Word don’t discuss the Word application program. Rather, they discuss WordprocessingML, the special XML vocabulary that Word uses when saving XML documents. The emphasis in these chapters is on creating WordprocessingML files and extracting data from them using external applications.
In order to get maximum value from the chapters on WordprocessingML, it helps to know how Word documents are structured. After all, it doesn’t do much good to learn how a document section is represented in WordprocessingML if you don’t know what a document section is in the first place. On the other hand, readers already familiar with Word’s document object model will find in the book tantalizing clues to Word’s inner workings when the authors discuss the detailed WordprocessingML representation of certain document structures, such as lists and list styles.
The final two chapters on Word show how the Word application program can be used to work with XML data files. One chapter discusses creation of a basic XML-enabled document template. The other discusses creation of a SmartDocument, which is to say, an XML-enabled document with a custom taskpane for interacting with the document.
For readers who want to know how to harness the power of XML in Word, these two chapters contain a wealth of information.
One drawback is that each sample project is presented as an integrated whole and the reader is therefore required to absorb a very large range of technical concepts all at once. This makes for hard reading, although patient and persistent readers will be able to glean the fundamental concepts involved. I would have preferred to see a series of projects starting with a very, very simple one and proceeding by degrees to a very complex one.
Another drawback is that some technical terms (for example, leaf node, selection, and range), are initially used without definition and then later defined. I imagine this reflects the tremendous pressure involved in getting the book out as quickly as possible. Once again, this is a case where the book requires patience and persistence on the part of the reader.
The chapters on Excel take an approach opposite to that of the chapters on Word. That is, the first chapter discusses the Excel application program and the second one discusses SpreadsheetML, the XML vocabulary that Excel uses when saving XML workbooks.
The authors can be commended for taking a big-picture view and for demonstrating that Office 2003 allows users to share data in ways never before possible. In addition, they can be commended for providing well-written appendices on introductory concepts. Their book may turn out to be especially valued by readers who want to read and write Office documents using third-party programs.
Review from Cindy Meister
Office 2003 XML is aimed at readers with XML background who want to get started with the XML features in Office 2003. A basic introduction to XML topics is included in the appendix, but if XML is completely new for you, you'll have difficulty grasping much of the content.
The first chapter is an overview of the importance of XML, in general, and in Office, in particular. It presents the strengths of XML in each application and a number of sample scenarios. For my taste, this chapter was somewhat long-winded and meandering, but it's only 16 pages out of some 550 (including the appendices).
In chapter two, the authors then take the reader into the world of Microsoft Word and XML. In total, four chapters are devoted to Word, while two are devoted to Excel, and one each to Access, InfoPath and using web services in Office applications. At first glance, this may seem out of proportion, but considering the complexity and richness of Word's object model, one quickly realizes that four chapters only manage to scratch the surface of this subject.
Unlike Access and Excel, which are primarily concerned with data, the XML that describes a Word document (WordProcessingML) reflects every aspect of a document, including the internal structures. If you save a Word document in its native XML, then re-open it, the document is there in its entirety. Contrast this with Access, that only imports and exports table data and indexes, or Excel, where only table information and formatting is exported to XML (no charts or other objects).
"The WordprocessingML Vocabulary", goes into quite a bit of detail on the internal structures of a Word document, where it stores information, such as formatting, and how that information is applied to the text. It is therefore also an excellent reference for developers who have never before worked intensively with Word and find Word's internal logic a mystery.
Chapter three describes how WordProcessingML can be used to extract information from Word documents, as well as to create them. Of necessity, it continues to delve the depths of Word's internal document structures. A couple of scenarios presenting the strengths of being able to process documents directly, without opening them in Word, round off the chapter.
The fourth chapter discusses using the XML features in the Word user interface. The importance of schemas in the Office XML concept is introduced. This chapter builds the basis for the following one on SmartDocument technology, taking you through all the steps necessary to prepare a document for use in a business solution. By this point, one can see that XML in Word is not really meant for the end-user, but for the developer.
Up until now, no developer background has been required. But in order to use the information in the chapter on SmartDocuments you do need background in a programming language such as VB6, C++ or a .NET framework language. The author describes all the components that go into a SmartDocument solution, then presents the coding aspect in a VB.NET example. Note that, even if you're not familiar with VB.NET, the discussion is quite clear. If you download the SmartDocument SDK and compare the sample code for your language of choice, you'll have no trouble following along. Remembering my experience with beta testing Word 2003 and SmartDocuments, I can only wish I'd had this information a couple of years ago.
SmartDocument technology works in both Word and Excel, but unfortunately, the author does not present an example for Excel.
In chapter six you learn how Excel processes the XML it imports and exports, and how you can affect the result using a schema and XML maps. It discusses how to deal with certain kinds of complexly built XML documents, that don't immediately lend themselves to Excel's flat-table concept. You're also shown how Excel can be used a front-end to work with data in XML source files.
SpreadsheetML, Excel's native XML language, is the topic of the next chapter. The basic structures and limitations are described, including how to extract information from and how to build an XML file that Excel will recognize as a spreadsheet.
Chapter eight, on XML in Access, is reasonably straight-forward, as XML is really only another database format for import/export, as far as Access is concerned. The author explains why this is so, and how to use the various options available to get the best result.
"Using Web Services in Excel, Access and Word" gets into territory unfamiliar to most Office (VBA) developers. But using the tool Microsoft supplies as a download makes this easy. Here's a good way to test the waters. The examples are clearly explained and easy to follow.
The last chapter presents InfoPath in some depth. It opens with a discussion about what problems InfoPath is meant to address, and compares it to other XML forms packages and technologies. The author then introduces the various components that go into an InfoPath solution, with an detailed analysis of the XML involved (InfoPath is 100% XML). Only after all the internal details have been presented do we see the InfoPath application, and how to construct the same files quickly and painlessly in the user interface. As this is a book on XML, this can be considered a legitimate approach, but I found the last few pages to be something of a let-down and would have preferred to see them earlier in the chapter.
One of the most important aspects of XML: information exchange between Office applications, is not highlighted as much as it might have been. The book could have profited from an example that pulls together the various Office applications.
On the whole, this is a good book, packed full of information on a technology that's new in Office. Since XML and related aspects (schemas and transforms) is such a broad subject, the many links to supplementary material, both in book form and on the web are most welcome. It's a book you'll need to read more than once in order to get the most from it.