I noticed today that Joel Spolsky had written an entry about how complicated the Word binary formats are now that Microsoft has finally released the full binary file formats. He notes that the spec is a whopping 349 pages, and you have to digest another 9 if you’re interested in the internal storage layout too.
358 pages? You should try digesting the new XML formats!
First, we get to the WordprocessingML, the reference for which weighs in at 5219 pages! Yep, that’s a whole 34Mb of PDF. Add in the specs for the ‘extras’ that annex the TC45 spec, like DrawingML, CustomXML, Biblio, VML, and EquationML and then add in the OpenXML / OpenPackagingConvention documentation, and its approaching 9,000 pages of specs.
If you’re into self-abuse you can find the whole TC45 document and associated specs for the Office formats on the this page of the ECMA site
Alternatively if you’re interested in the OpenXML formats, and don’t fancy blowing the next 2 years of your life reading the specs, Wouter Van Vugt has an excellent blog on the subject and is the author of the rather more useful (and free!) OpenXML Explained ebook!
Enjoy.