Convert MS Word DOCX Files to Markdown with Images
I don’t know when was the last time I received a Microsoft Word .docx
file. However long the streak may have been: it has been broken today.
The document contained links and embedded images. I was instantly taken aback by the prospect of all the manual labor of extracting the images and saving them to files, not even knowing how MS Word behaves nowadays.
But I have pandoc installed.
And it’s great. It even extracts images and saves them into a subfolder. I love it. Didn’t know about this feature until I scrolled through the output of pandoc -h
.
This is the command I used:
$ pandoc -o output.md --extract-media=./ inputfile.docx
It puts all images into a media/
subfolder anyway, so I set extract-media
to the current directory. Just lovely.