Python 文件读写处理的全面解决方案

无界猴

6个月前更新

052062

Python 提供了多种高效处理文件读写的方法，下面我将介绍几种一次性解决常见文件处理难题的方案。

1. 基础文件读写操作

读取整个文件内容

with open('filename.txt', 'r', encoding='utf-8') as file:
    content = file.read()  # 读取全部内容为字符串

逐行读取

with open('filename.txt', 'r', encoding='utf-8') as file:
    lines = file.readlines()  # 返回列表，每行一个元素
    # 或者逐行处理
    for line in file:
        process(line)

写入文件

# 覆盖写入
with open('filename.txt', 'w', encoding='utf-8') as file:
    file.write("Hello, World!")

# 追加写入
with open('filename.txt', 'a', encoding='utf-8') as file:
    file.write("\nNew line")

2. 高级文件处理技巧

使用 `pathlib` 处理路径（Python 3.4+）

from pathlib import Path

# 创建Path对象
file_path = Path('data/file.txt')

# 读取文件
content = file_path.read_text(encoding='utf-8')

# 写入文件
file_path.write_text("New content", encoding='utf-8')

# 检查文件是否存在
if file_path.exists():
    print("文件存在")

处理大文件（内存高效）

# 逐行处理大文件
with open('large_file.csv', 'r', encoding='utf-8') as file:
    for line in file:
        process_line(line)  # 逐行处理，不占用大量内存

使用 `io` 模块处理不同数据源

import io

# 从字符串创建文件对象
string_data = "Line 1\nLine 2\nLine 3"
file_like = io.StringIO(string_data)

# 像操作文件一样操作字符串
for line in file_like:
    print(line.strip())

3. CSV 文件处理

使用内置 `csv` 模块

import csv

# 读取CSV
with open('data.csv', 'r', encoding='utf-8') as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)

# 写入CSV
data = [['Name', 'Age'], ['Alice', 25], ['Bob', 30]]
with open('output.csv', 'w', encoding='utf-8', newline='') as file:
    writer = csv.writer(file)
    writer.writerows(data)

使用 `pandas` 处理复杂CSV（推荐用于数据分析）

import pandas as pd

# 读取CSV
df = pd.read_csv('data.csv')

# 数据处理
df['Age'] = df['Age'] + 1

# 写入CSV
df.to_csv('processed_data.csv', index=False)

4. JSON 文件处理

使用内置 `json` 模块

import json

# 读取JSON
with open('data.json', 'r', encoding='utf-8') as file:
    data = json.load(file)

# 写入JSON
with open('output.json', 'w', encoding='utf-8') as file:
    json.dump(data, file, indent=4, ensure_ascii=False)

5. 二进制文件处理

# 读取二进制文件
with open('image.jpg', 'rb') as file:
    binary_data = file.read()

# 写入二进制文件
with open('copy.jpg', 'wb') as file:
    file.write(binary_data)

6. 综合解决方案

对于复杂的文件处理任务，可以封装成函数：

import os
from pathlib import Path

def process_file(input_path, output_path=None, process_func=None):
    """
    通用文件处理函数

    参数:
        input_path: 输入文件路径
        output_path: 输出文件路径(可选)
        process_func: 处理函数，接收文件内容并返回处理后的内容
    """
    # 转换为Path对象
    input_path = Path(input_path)

    # 如果没有指定输出路径，则在原文件名后加_processed
    if output_path is None:
        output_path = input_path.with_name(f"{input_path.stem}_processed{input_path.suffix}")

    # 读取文件
    content = input_path.read_text(encoding='utf-8')

    # 处理内容
    if process_func:
        processed_content = process_func(content)
    else:
        processed_content = content.upper()  # 默认转换为大写

    # 写入文件
    output_path.write_text(processed_content, encoding='utf-8')
    print(f"文件已处理并保存到: {output_path}")

# 使用示例
def custom_process(content):
    return content.replace("old", "new")

process_file('input.txt', process_func=custom_process)

7. 异常处理

try:
    with open('file.txt', 'r') as file:
        content = file.read()
except FileNotFoundError:
    print("文件不存在")
except PermissionError:
    print("没有权限访问文件")
except UnicodeDecodeError:
    print("文件编码错误")
except Exception as e:
    print(f"发生未知错误: {e}")

通过以上方法，你可以高效地处理各种文件读写需求。对于特别大的文件，建议使用逐行处理或分块读取的方式；对于数据分析任务，pandas 是最佳选择；对于路径操作，pathlib 比传统 os 模块更直观易用。

文中内容均来源于公开资料，受限于信息的时效性和复杂性，可能存在误差或遗漏。我们已尽力确保内容的准确性，但对于因信息变更或错误导致的任何后果，本站不承担任何责任。如需引用本文内容，请注明出处并尊重原作者的版权。

THE END