为了解乱码是如何形成的,我写了一个简单的demo程序辅助参考。
这是各种编码之间来回转乱会导致的可能乱码结果
总结下就是
原始编码 | 目标编码 | 结果 |
---|---|---|
UTF-8 | ASCII | ‘ascii’ codec can’t decode byte 0xe9 in position 0: ordinal not in range(128) |
UTF-8 | GBK | 闇嶆牸娌冨吂娴嬭瘯瀛﹂櫌 ceshiren.com |
UTF-8 | UNICODE_ESCAPE | éæ ¼æ²å ¹æµè¯å¦é¢ ceshiren.com |
GBK | ASCII | ‘ascii’ codec can’t decode byte 0xbb in position 0: ordinal not in range(128) |
GBK | UTF-8 | ‘utf-8’ codec can’t decode byte 0xbb in position 0: invalid start byte |
GBK | UNICODE_ESCAPE | »ô¸ñÎÖ×ȲâÊÔѧԺ ceshiren.com |
示例代码
import pytest
from mtf.core.logger import log, log_to_file
log_to_file()
@pytest.mark.parametrize('from_encoding, to_encoding', [
['UTF-8', 'ASCII'],
['UTF-8', 'GBK'],
['UTF-8', 'UNICODE_ESCAPE'],
['GBK', 'ASCII'],
['GBK', 'UTF-8'],
['GBK', 'UNICODE_ESCAPE'],
])
def test_encoding(from_encoding, to_encoding):
try:
content = '霍格沃兹测试学院 ceshiren.com'
log.debug(f'from = {from_encoding} to = {to_encoding}')
log.debug(content.encode(from_encoding))
log.debug(content.encode(from_encoding).decode(to_encoding))
except Exception as e:
log.exception(e)
运行日志
[2021-01-15 18:57:06] 74940:D logger.py:61:log_to_file: log init ./debug.2021.01.15-18.57.06.log
[2021-01-15 18:57:06] 74940:D test_encoding.py:19:test_encoding: from = UTF-8 to = ASCII
[2021-01-15 18:57:06] 74940:D test_encoding.py:20:test_encoding: b'\xe9\x9c\x8d\xe6\xa0\xbc\xe6\xb2\x83\xe5\x85\xb9\xe6\xb5\x8b\xe8\xaf\x95\xe5\xad\xa6\xe9\x99\xa2 ceshiren.com'
[2021-01-15 18:57:06] 74940:E test_encoding.py:23:test_encoding: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)
Traceback (most recent call last):
File "/Users/seveniruby/PycharmProjects/MTF/mtf/tests/test_python/test_encoding.py", line 21, in test_encoding
log.debug(content.encode(from_encoding).decode(to_encoding))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)
[2021-01-15 18:57:06] 74940:D test_encoding.py:19:test_encoding: from = UTF-8 to = GBK
[2021-01-15 18:57:06] 74940:D test_encoding.py:20:test_encoding: b'\xe9\x9c\x8d\xe6\xa0\xbc\xe6\xb2\x83\xe5\x85\xb9\xe6\xb5\x8b\xe8\xaf\x95\xe5\xad\xa6\xe9\x99\xa2 ceshiren.com'
[2021-01-15 18:57:06] 74940:D test_encoding.py:21:test_encoding: 闇嶆牸娌冨吂娴嬭瘯瀛﹂櫌 ceshiren.com
[2021-01-15 18:57:06] 74940:D test_encoding.py:19:test_encoding: from = UTF-8 to = UNICODE_ESCAPE
[2021-01-15 18:57:06] 74940:D test_encoding.py:20:test_encoding: b'\xe9\x9c\x8d\xe6\xa0\xbc\xe6\xb2\x83\xe5\x85\xb9\xe6\xb5\x8b\xe8\xaf\x95\xe5\xad\xa6\xe9\x99\xa2 ceshiren.com'
[2021-01-15 18:57:06] 74940:D test_encoding.py:21:test_encoding: éæ ¼æ²å
¹æµè¯å¦é¢ ceshiren.com
[2021-01-15 18:57:06] 74940:D test_encoding.py:19:test_encoding: from = GBK to = ASCII
[2021-01-15 18:57:06] 74940:D test_encoding.py:20:test_encoding: b'\xbb\xf4\xb8\xf1\xce\xd6\xd7\xc8\xb2\xe2\xca\xd4\xd1\xa7\xd4\xba ceshiren.com'
[2021-01-15 18:57:06] 74940:E test_encoding.py:23:test_encoding: 'ascii' codec can't decode byte 0xbb in position 0: ordinal not in range(128)
Traceback (most recent call last):
File "/Users/seveniruby/PycharmProjects/MTF/mtf/tests/test_python/test_encoding.py", line 21, in test_encoding
log.debug(content.encode(from_encoding).decode(to_encoding))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xbb in position 0: ordinal not in range(128)
[2021-01-15 18:57:06] 74940:D test_encoding.py:19:test_encoding: from = GBK to = UTF-8
[2021-01-15 18:57:06] 74940:D test_encoding.py:20:test_encoding: b'\xbb\xf4\xb8\xf1\xce\xd6\xd7\xc8\xb2\xe2\xca\xd4\xd1\xa7\xd4\xba ceshiren.com'
[2021-01-15 18:57:06] 74940:E test_encoding.py:23:test_encoding: 'utf-8' codec can't decode byte 0xbb in position 0: invalid start byte
Traceback (most recent call last):
File "/Users/seveniruby/PycharmProjects/MTF/mtf/tests/test_python/test_encoding.py", line 21, in test_encoding
log.debug(content.encode(from_encoding).decode(to_encoding))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb in position 0: invalid start byte
[2021-01-15 18:57:06] 74940:D test_encoding.py:19:test_encoding: from = GBK to = UNICODE_ESCAPE
[2021-01-15 18:57:06] 74940:D test_encoding.py:20:test_encoding: b'\xbb\xf4\xb8\xf1\xce\xd6\xd7\xc8\xb2\xe2\xca\xd4\xd1\xa7\xd4\xba ceshiren.com'
[2021-01-15 18:57:06] 74940:D test_encoding.py:21:test_encoding: »ô¸ñÎÖ×ȲâÊÔѧԺ ceshiren.com