常见乱码参考

为了解乱码是如何形成的,我写了一个简单的demo程序辅助参考。
这是各种编码之间来回转乱会导致的可能乱码结果

总结下就是

原始编码 目标编码 结果
UTF-8 ASCII ‘ascii’ codec can’t decode byte 0xe9 in position 0: ordinal not in range(128)
UTF-8 GBK 闇嶆牸娌冨吂娴嬭瘯瀛﹂櫌 ceshiren.com
UTF-8 UNICODE_ESCAPE éœæ ¼æ²ƒå…¹æµ‹è¯•å­¦é™¢ ceshiren.com
GBK ASCII ‘ascii’ codec can’t decode byte 0xbb in position 0: ordinal not in range(128)
GBK UTF-8 ‘utf-8’ codec can’t decode byte 0xbb in position 0: invalid start byte
GBK UNICODE_ESCAPE »ô¸ñÎÖ×ȲâÊÔѧԺ ceshiren.com

示例代码

import pytest

from mtf.core.logger import log, log_to_file

log_to_file()


@pytest.mark.parametrize('from_encoding, to_encoding', [
    ['UTF-8', 'ASCII'],
    ['UTF-8', 'GBK'],
    ['UTF-8', 'UNICODE_ESCAPE'],
    ['GBK', 'ASCII'],
    ['GBK', 'UTF-8'],
    ['GBK', 'UNICODE_ESCAPE'],
])
def test_encoding(from_encoding, to_encoding):
    try:
        content = '霍格沃兹测试学院 ceshiren.com'
        log.debug(f'from = {from_encoding} to = {to_encoding}')
        log.debug(content.encode(from_encoding))
        log.debug(content.encode(from_encoding).decode(to_encoding))
    except Exception as e:
        log.exception(e)

运行日志

[2021-01-15 18:57:06] 74940:D logger.py:61:log_to_file: log init ./debug.2021.01.15-18.57.06.log
[2021-01-15 18:57:06] 74940:D test_encoding.py:19:test_encoding: from = UTF-8 to = ASCII
[2021-01-15 18:57:06] 74940:D test_encoding.py:20:test_encoding: b'\xe9\x9c\x8d\xe6\xa0\xbc\xe6\xb2\x83\xe5\x85\xb9\xe6\xb5\x8b\xe8\xaf\x95\xe5\xad\xa6\xe9\x99\xa2 ceshiren.com'
[2021-01-15 18:57:06] 74940:E test_encoding.py:23:test_encoding: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)
Traceback (most recent call last):
  File "/Users/seveniruby/PycharmProjects/MTF/mtf/tests/test_python/test_encoding.py", line 21, in test_encoding
    log.debug(content.encode(from_encoding).decode(to_encoding))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)
[2021-01-15 18:57:06] 74940:D test_encoding.py:19:test_encoding: from = UTF-8 to = GBK
[2021-01-15 18:57:06] 74940:D test_encoding.py:20:test_encoding: b'\xe9\x9c\x8d\xe6\xa0\xbc\xe6\xb2\x83\xe5\x85\xb9\xe6\xb5\x8b\xe8\xaf\x95\xe5\xad\xa6\xe9\x99\xa2 ceshiren.com'
[2021-01-15 18:57:06] 74940:D test_encoding.py:21:test_encoding: 闇嶆牸娌冨吂娴嬭瘯瀛﹂櫌 ceshiren.com
[2021-01-15 18:57:06] 74940:D test_encoding.py:19:test_encoding: from = UTF-8 to = UNICODE_ESCAPE
[2021-01-15 18:57:06] 74940:D test_encoding.py:20:test_encoding: b'\xe9\x9c\x8d\xe6\xa0\xbc\xe6\xb2\x83\xe5\x85\xb9\xe6\xb5\x8b\xe8\xaf\x95\xe5\xad\xa6\xe9\x99\xa2 ceshiren.com'
[2021-01-15 18:57:06] 74940:D test_encoding.py:21:test_encoding: éœæ ¼æ²ƒå…¹æµ‹è¯•å­¦é™¢ ceshiren.com
[2021-01-15 18:57:06] 74940:D test_encoding.py:19:test_encoding: from = GBK to = ASCII
[2021-01-15 18:57:06] 74940:D test_encoding.py:20:test_encoding: b'\xbb\xf4\xb8\xf1\xce\xd6\xd7\xc8\xb2\xe2\xca\xd4\xd1\xa7\xd4\xba ceshiren.com'
[2021-01-15 18:57:06] 74940:E test_encoding.py:23:test_encoding: 'ascii' codec can't decode byte 0xbb in position 0: ordinal not in range(128)
Traceback (most recent call last):
  File "/Users/seveniruby/PycharmProjects/MTF/mtf/tests/test_python/test_encoding.py", line 21, in test_encoding
    log.debug(content.encode(from_encoding).decode(to_encoding))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xbb in position 0: ordinal not in range(128)
[2021-01-15 18:57:06] 74940:D test_encoding.py:19:test_encoding: from = GBK to = UTF-8
[2021-01-15 18:57:06] 74940:D test_encoding.py:20:test_encoding: b'\xbb\xf4\xb8\xf1\xce\xd6\xd7\xc8\xb2\xe2\xca\xd4\xd1\xa7\xd4\xba ceshiren.com'
[2021-01-15 18:57:06] 74940:E test_encoding.py:23:test_encoding: 'utf-8' codec can't decode byte 0xbb in position 0: invalid start byte
Traceback (most recent call last):
  File "/Users/seveniruby/PycharmProjects/MTF/mtf/tests/test_python/test_encoding.py", line 21, in test_encoding
    log.debug(content.encode(from_encoding).decode(to_encoding))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb in position 0: invalid start byte
[2021-01-15 18:57:06] 74940:D test_encoding.py:19:test_encoding: from = GBK to = UNICODE_ESCAPE
[2021-01-15 18:57:06] 74940:D test_encoding.py:20:test_encoding: b'\xbb\xf4\xb8\xf1\xce\xd6\xd7\xc8\xb2\xe2\xca\xd4\xd1\xa7\xd4\xba ceshiren.com'
[2021-01-15 18:57:06] 74940:D test_encoding.py:21:test_encoding: »ô¸ñÎÖ×ȲâÊÔѧԺ ceshiren.com