Document Corruption

Article contributed by John McGhie & Beth Rosengard

If your problem manifests with just one document (or a specific subset of documents), but not with all documents, it is probable that you’re suffering from document corruption. Symptoms may include weird page numbering (drag the thumb down the right vertical margin and watch the page number counter – it will go crazy when you pass a corruption) infinite repagination, incorrect document layout and formatting, unreadable characters on the screen, hangs or crashes when you load or view a particular file. Such corruption is generally carried in the very last paragraph mark in a document, which is the marker for a hidden container in which Word stores all document properties including formatting information. For more on why documents corrupt, click here.

Known Issues:

“Uncorrupt” Your Document

With all of the procedures below, you need to check the document carefully afterwards to ensure that you have not lost anything important: Any procedure may cause Word to discard corrupt information that it cannot fix. There may be valuable text contained within that information. Pay particular attention to bullets, numbering, headers and footers, and tables – these are the things that usually corrupt – but you must also check for sections of missing text. For this reason, you should make a copy of your original document before beginning these procedures. For more on “best practices” for backing up your work, see here.

How severe the loss depends on the complexity of your document, how damaged it is, your version of Word, and which of the procedures below you choose. Because Word 2004's HTML capability (actually, its XML capability...) is well ahead of Word X’s, you are likely to lose less formatting by using the Save As Web Page method with Word 2004. Furthermore, while using this method, Word 2004 produces a compatibility report that tells you beforehand what, if anything, you will lose.

Procedure #1: Save As Web Page

First line of defense for Word 2004 (but if it doesn't work, definitely try Procedure #2). Second line of defense for Word X (try Procedure #2 first).

  1. Save as Web Page. CAUTION: make sure you choose "Save entire file into HTML" and NOT "Save only display information into HTML". If you fail to “Save entire file into HTML”, fundamental things like headers, footers, section breaks and page numbers that have no equivalents in HTML will be stripped out. NOTE: While Word writes screen tips to Web pages, it does NOT reimport them. Any screen tips in your document will be lost.
  2. Close the Word Document. If you do not, the Word Document version, with its corruptions, remains the file in use. Word simply continues to use the bad file instead of rebuilding a new file from the stored information in the web page version.
  3. Open the Web Page version.
  4. Save as a new Word Document with a different file name.
How it works: This technique does not save the "Word Document", instead, it saves the instructions for making a Word document. When you re-open the file, Word builds a new document using the saved instructions. Because the saved instructions cannot cause Word to make a corruption, the new document is automatically not corrupt when it is re-created.

Procedure #2: Copy All But Last Paragraph Mark

First line of defense for Word X. Second line of defense for Word 2004 (try Procedure #1 first).

  1. Turn on Show/Hide formatting (click on the pilcrow – the ¶ symbol on your toolbar).
  2. Carefully copy all of your document except the last paragraph mark .
  3. Paste into a blank new document.
  4. Save as a Word Document with a different file name.

Procedure #3: Open in Another Text Editor

  1. Open your Word document in AppleWorks (If you’re in OS 10.3, TextEdit will also work).
  2. Look for obviously weird data and delete it.
  3. Alternatively, you can copy the “clean” paragraphs and paste them into a new Word document.

Procedure #4: Binary Search

If the above procedures don’t help and you believe the problem lies in a corrupt table or a paragraph somewhere, a binary search can help narrow it down. And don’t forget that there can be more than one corrupt element in a single document!

It is also possible, but unlikely, for an intra-document section break to contain corruption; unlikely because such section breaks inherit their formatting from the File Header which is eliminated along with other document formatting when you perform Procedure #2. However it is not impossible, so keep that in mind as you proceed.

  1. Divide the document in half to determine which half (or halves) contains the corruption.
  2. Divide that half again, etc., until you have isolated the offending element(s).
  3. Repair the corrupt element if possible. In the case of a corrupt paragraph or section, copy all but its last paragraph mark into a new Word document. If it’s a table, your best bet is to Convert Table To Text and then immediately Convert Text To Table again. A corrupt graphic may have to be replaced.
  4. Delete and remake the corrupt element if repair is impossible.

Return to Top