scuba zoomed

“Windmills Do Not Work That Way!”*

The life and hobbies of a firmware/software developer.

Previous Entry Share Next Entry
Converting LaTeX to plain text
scuba zoomed
My résumé is typeset using LaTeX. It generates nice PDFs, but many online job applications require plain-text versions to be submitted.

Here is a sequence that converts my LaTeX file to plain text:

$ latex zdpurvis_resume.tex
$ catdvi -e 1 -U zdpurvis_resume.dvi | sed -re "s/\[U\+2022\]/*/g" | sed -re "s/([^^[:space:]])\s+/\1 /g" > zdpurvis_resume.txt

The -e 1 option to catdvi tells it to output ASCII. If you use 0 instead of 1, it will output Unicode. Unicode will include all the special characters like bullets, emdashes, and Greek letters. It also include ligatures for some letter combinations like "fi" and "fl." You may not like that. So, use -e 1 instead. Use the -U option to tell it to print out the unicode value for unknown characters so that you can easily find and replace them.

The second part of the command finds the string [U+2022] which is used to designate bullet characters (•) and replaces them with an asterisk (*).

The third part eats up all the extra whitespace catdvi threw in to make the text full-justified while preserving spaces at the start of lines (indentation).

After running these commands, you would be wise to search the .txt file for the string [U+ to make sure no Unicode characters that can't be mapped to ASCII were left behind and fix them.

Thank goodness...


2009-07-16 12:31 am (UTC)

...for smart people with time on their hands.

That's really cool, Zane. I remember it being a real pain to convert to plain text for those application websites.

It sure is! Even more so when I change parts of my resume depending on the job I'm applying for and have to do the conversion more than once. This sequence of commands takes most of the busy-work out of it. I still need to adjust some vertical spacing between sections, but that's nothing compared to dealing with the horizontal spacing and all the special characters that get inserted.

detex, unicode


2009-08-02 09:18 am (UTC)

well, i use detex for this purpose. it's part of texlive extra utilities. also, you recommend ditching the unicode enconding in favor of ascii, while even the word 'résumé' includes non-ascii characters:). other than that, i'm always happy to see people blogging about things like this, really.

I'd never heard of detex. Thanks for the info. I should dig up that texlive manual!

I tried out detex, and it leaves a little to be desired for me. It just strips out the TeX commands that it recognizes, and leaves the ones it doesn't. I wind up with some spacing commands being ignored, some measurement units still being displayed, and \kill lines still showing up. For example: UniversityRaleigh, NCAugust 2008. Custom commands aren't parsed properly, either:
#1, #2 #3

And, I appreciate the irony of résumé having non-unicode characters where I'm suggesting people convert to ASCII. Some online forms choke on the unicode chars, though. :(

You are viewing zanedp