Python中文乱码问题全面解决方案

中文乱码是Python开发中常见的问题，尤其是在处理文件读写、网络传输和数据库操作时。以下是全面的解决方案：

1. 理解编码基础

Python 3默认使用UTF-8编码，但在某些情况下仍可能出现乱码。

import sys
print(sys.getdefaultencoding())  # 查看系统默认编码

2. 源代码文件编码声明

确保Python文件开头有编码声明：

# -*- coding: utf-8 -*-

3. 字符串编码与解码

编码（字符串→字节）

text = "中文"
encoded = text.encode('utf-8')  # 转换为UTF-8字节

解码（字节→字符串）

byte_data = b'\xe4\xb8\xad\xe6\x96\x87'
decoded = byte_data.decode('utf-8')  # 转换为字符串

4. 文件读写编码处理

读取文件

with open('file.txt', 'r', encoding='utf-8') as f:
    content = f.read()

写入文件

with open('file.txt', 'w', encoding='utf-8') as f:
    f.write("中文内容")

5. 常见场景解决方案

处理CSV文件

import csv
with open('data.csv', 'r', encoding='gbk') as f:  # 中文Windows常用GBK编码
    reader = csv.reader(f)
    for row in reader:
        print(row)

处理JSON数据

import json
data = json.loads(json_str, encoding='utf-8')
# 或
with open('data.json', 'r', encoding='utf-8') as f:
    data = json.load(f)

网络请求处理

import requests
response = requests.get(url)
response.encoding = 'utf-8'  # 或 'gbk' 根据实际情况调整
text = response.text

数据库操作

# MySQL示例
import pymysql
conn = pymysql.connect(host='localhost', user='root', password='', 
                       db='test', charset='utf8mb4')  # 注意是utf8mb4

6. 系统环境编码设置

import locale
locale.setlocale(locale.LC_ALL, 'zh_CN.UTF-8')  # 设置系统区域设置

7. 常见编码格式

UTF-8：国际通用编码，推荐使用
GBK/GB2312：中文Windows常用编码
UTF-16：较少使用
ISO-8859-1：西欧语言编码

8. 调试技巧

当遇到乱码时，可以尝试以下方法：

# 查看字节表示
print(repr(text.encode('utf-8')))

# 尝试不同编码
encodings = ['utf-8', 'gbk', 'gb2312', 'big5', 'latin1']
for enc in encodings:
    try:
        print(f"{enc}: {byte_data.decode(enc)}")
    except:
        print(f"{enc}: failed")