Tuxedo
2013-03-25 23:18:10 UTC
I use wget to fetch html files which always include some German characters,
but after fetching, they display incorrectly in various applications that
normally display German characters fine. The broken characters represent:
ä a umlaut, small
Ä A umlaut, capital
ö o umlaut, small
Ö O umlaut, capital
ü u umlaut, small
Ü U umlaut, capital
ß sharp s
I'm not sure what happens but depending on the application they're opened
in after they may show as ö, Â, ü, äà etc. These broken characters have
surely been created via a web-browser, possibly in a UFT-8 mode.
Normally when I've encounter similar problems I've worked around it with a
process like:
perl -pi -e "s/%C3%BC/ü/g;" *.html
However, this doesn't match in this case and I'm not sure what to convert
the characters from. While the fetched files are fully readable in a UTF-8
editor, such as Yuedit, not in other editors that normally read German
characters but do not work with a full UTF-8 charset. Any ideas how to
replace the ö, Â, ü whatever they are with the more widely adopted
extented German ASCII charset?
Many thanks for any tips!
Tuxedo
but after fetching, they display incorrectly in various applications that
normally display German characters fine. The broken characters represent:
ä a umlaut, small
Ä A umlaut, capital
ö o umlaut, small
Ö O umlaut, capital
ü u umlaut, small
Ü U umlaut, capital
ß sharp s
I'm not sure what happens but depending on the application they're opened
in after they may show as ö, Â, ü, äà etc. These broken characters have
surely been created via a web-browser, possibly in a UFT-8 mode.
Normally when I've encounter similar problems I've worked around it with a
process like:
perl -pi -e "s/%C3%BC/ü/g;" *.html
However, this doesn't match in this case and I'm not sure what to convert
the characters from. While the fetched files are fully readable in a UTF-8
editor, such as Yuedit, not in other editors that normally read German
characters but do not work with a full UTF-8 charset. Any ideas how to
replace the ö, Â, ü whatever they are with the more widely adopted
extented German ASCII charset?
Many thanks for any tips!
Tuxedo