In an ideal world, the only character encoding (or, loosely, "character set") that you'd ever see would be UTF-8 (utf-8), and Latin-1 (iso-8859-1) for all those legacy documents. However, the encodings mentioned below exist and can be found on the Web. They are listed below in order of their English names, with the lefthand side being the value you'd get returned from $response->content_charset. The complete list of character sets can be found at http://www.iana.org/assignments/character-sets.
Value |
Encoding |
---|---|
us-ascii |
ASCII plain (just characters 0x00-0x7F) |
asmo-708 |
Arabic ASMO-708 |
iso-8859-6 |
Arabic ISO |
dos-720 |
Arabic MSDOS |
windows-1256 |
Arabic MSWindows |
iso-8859-4 |
Baltic ISO |
windows-1257 |
Baltic MSWindows |
iso-8859-2 |
Central European ISO |
ibm852 |
Central European MSDOS |
windows-1250 |
Central European MSWindows |
hz-gb-2312 |
Chinese Simplified (HZ) |
gb2312 |
Chinese Simplified (GB2312) |
euc-cn |
Chinese Simplified EUC |
big5 |
Chinese Traditional (Big5) |
cp866 |
Cyrillic DOS |
iso-8859-5 |
Cyrillic ISO |
koi8-r |
Cyrillic KOI8-R |
koi8-u |
Cyrillic KOI8-U |
windows-1251 |
Cyrillic MSWindows |
iso-8859-7 |
Greek ISO |
windows-1253 |
Greek MSWindows |
iso-8859-8-i |
Hebrew ISO Logical |
iso-8859-8 |
Hebrew ISO Visual |
dos-862 |
Hebrew MSDOS |
windows-1255 |
Hebrew MSWindows |
euc-jp |
Japanese EUC-JP |
iso-2022-jp |
Japanese JIS |
shift_jis |
Japanese Shift-JIS |
iso-2022-kr |
Korean ISO |
euc-kr |
Korean Standard |
windows-874 |
Thai MSWindows |
iso-8859-9 |
Turkish ISO |
windows-1254 |
Turkish MSWindows |
utf-8 |
Unicode expressed as UTF-8 |
utf-16 |
Unicode expressed as UTF-16 |
windows-1258 |
Vietnamese MSWindows |
viscii |
Vietnamese VISCII |
iso-8859-1 |
Western European (Latin-1) |
windows-1252 |
Western European (Latin-1) with extra characters in 0x80-0x9F |