Replacing those pesky smart quotes in VIM
Recently I’ve been running into a lot of silliness that appears in files exported from MS applications (Word, Excel, etc.) called ‘Smart quotes’.
Basically MS uses higher level ascii characters to represent quotes that mean more than regular quotes (whatever for?!). You see this weirdness in vi as <93><92> etc. which are the hex values of these characters. I had to hunt a bunch on google to find out how to fix this, although the fix is very easy.
For each value that you see in your file, just do a string substitution, like so:
:%s/<93>/\’/g
of course, you can’t just type that <93> in there, so to get it in there you use (via: http://www.vim.org/htmldoc/usr_45.html)
CTRL-V x 93
which inserts hex 93 in place.
In recently exported CSV’s from excel, I’ve seen hex 91-97.
Quite annoying, frankly.
–
You could also use perl, like here
perl -pi -e”s/\x92/’/g” myfile.html
and use it on multiple files.
Blogged with Flock
Tags: smart_quote, vim, ms_word, ms_excel, string_replacement, substitution, windows_unix, export, import, decimal, hex, 92, 93, <92>, <93>
Comments(8)
Thank you. You’re probably the only site in cyberspace that addresses this directly
Perusing through the vim help files can be hell sometimes.
Thanks for this. Although I don’t use vim it’s nice to see your rendition in perl.
Thanks a lot!! found this after a lot of googling!
exactly what i needed
thanks!
Oh, you’re my hero! Thanks for writing this up!
Thanks a lot for writing a short notice on this finding and for explaining the background (btw: the smart quotes are not MS specific – in my case they came from OpenOffice!).
Luckily, your blog gets indexed by google
And to find out what character something is if it’s not showing you (i.e. if your vim is actually showing you the curly-quotes instead of the hex values), you can type g8
Or :set statusline=%b\ 0x%B
And possibly :set laststatus=2 to make that visible
((from http://vim.wikia.com/wiki/Showing_the_ASCII_value_of_the_current_character ))
And if you need to set the full unicode value instead of the two-byte hex value, it’s ctrl-v u—so ctrl-v u 201c, for instance.
Thanks a lot.