Homepage

How to recover corrupted docx files

Today I was astonished to discover a Word document I was working on for the previous week got somehow corrupted. The file didn’t open in Pages, Microsoft Office, Google Docs, nor anywhere.

docx Pages error

After trying literally all services and applications out there, both free and paid, that claim to be able to recover your corrupted Word document without success, and finally get it working with two commands on my Mac, it’s mandatory to write a blog post about it.

  1. docx files are zip files. The first step is to recover it as a zip file:

    zip -FF document.docx --out recovered.docx
    
  2. Now we can install a handy Python script called docx2txt, which retrieves the text from the myriad of XML files inside the docx:

    docx2txt recovered.docx > recovered.txt
    

This way, I managed to recover the contents of the corrupted docx as plain text. There might be a way to recover the styling as well, but having just the contents made my day.