🍪
cookielau
  • Introduction
  • Machine Learning
    • Distributed
      • Bookmarks
    • NLP
      • Transformers
    • MLC
      • Tensor Program Abstraction
      • End-to-End Module Execution
  • Framework
    • PyTorch
      • Bookmarks
      • Model
      • Shared
      • Miscellaneous
    • Tensorflow
      • Bookmarks
      • Model
      • Shared
      • Miscellaneous
    • CUDA
      • Bookmarks
    • DeepSpeed
    • Bagua
      • Model
      • Optimizer
    • Others
      • Bookmarks
  • About Me
    • 2022-04-28
  • Random Thoughts
  • Archives
    • CPP
      • Bookmarks
      • Container
      • Algorithm
      • FILE CONTROL
      • Virtual Table
      • Assembly
      • Key Words
      • Problems
      • Others
    • JAVA
      • String Container
      • Maps
    • PYTHON
      • Bookmarks
      • Python Tools
        • Batch Rename
        • Combine Excel
        • Excel Oprations
        • Read Write Excel
        • Rotate PDF
      • Library
        • Pandas Notes
        • Numpy Notes
        • Json Notes
      • Spider
        • Selenium Install
        • Selenium Locating
        • Selenium Errors
        • Selenium Basics
      • Django
        • Start Up
      • Others
    • LINUX
      • Installation
      • Cli Tools
      • WSL
      • Bugs
    • JUNIOR2
      • Economics
        • Chapter 0x01 经济管理概述
        • Chapter 0x02 微观市场机制分析
        • Chapter 0x03 生产决策与市场结构
        • Chapter 0x04 宏观经济市场分析
        • Chapter 0x05 管理的职能
        • Chapter 0x06 生产系统结构与战略
        • Chapter 0x0b 投资项目经济评价
        • Chapter 0x0f 投资项目经济评价
      • Computer Network
        • 概述
        • 分层模型
        • 物理层
        • 数据链路层
        • 网络层
        • 传输层
        • 应用层
        • HTTP(s)实验
        • [Practice]
      • Software Engineering
        • Introduction
        • Demand Analysis
        • Task Estimation
        • Presentation
      • Network Security
        • Chapter 0x01 概述
        • Chapter 0x02 密码学
        • Chapter 0x03 公钥体制
        • Chapter 0x04 消息认证
        • Chapter 0x05 密钥管理
        • Chapter 0x06 访问控制
        • Assignments
      • x86 Programming
        • Basic Knowledge
        • Program Design
        • System Interruption
        • Frequently used functions
    • MD&LaTex
      • Markdown
      • LaTex
    • NPM
      • NPM LINK
    • MyBlogs
      • 2020BUAA软工——“停下来,回头看”
      • 2020BUAA软工——“初窥构建之法”
      • 2020BUAA软工——“上手软件工程,PSP初体验!”
      • 2020BUAA软工——“深度评测官”
      • 2020BUAA软工——“并肩作战,平面交点Pro”
    • SC
      • PAC 2022
        • Lectures
      • OpenMP & MPI
        • MPI Overview
        • Message Passing Programming
        • OpenMP Overview
        • Work Sharing Directives
        • Annual Challenge
        • Future Topics in OpenMP
        • Tasks
        • OpenMP & MPI
    • Hardware
      • Nvidia GPU
        • Frequent Error
        • Memory Classification
        • CUDA_7_Streams_Simplify_Concurrency
        • Optimize_Data_Transfers_in_CUDA
        • Overlap_Data_Transfers_in_CUDA
        • Write_Flexible_Kernels_with_Grid-Stride_Loops
        • How_to_Access_Global_Memory_Efficiently
        • Using_Shared_Memory
      • Intel CPU
        • Construction
        • Optimization
        • Compilation
        • OpenMP
    • English
      • Vocab
      • Composition
    • Interview
      • Computer Network
Powered by GitBook
On this page
  • Target
  • Requirement
  • Code
  • Details
  • Reference

Was this helpful?

  1. Archives
  2. PYTHON
  3. Python Tools

Combine Excel

Target

将当前文件目录下的每一层同级excel合并,并以该级目录命名

Before:
·
|__file
    |__file1
        |__1.xls
        |__2.xls
    |__file2
        |__3.xls
        |__4.xls
        |__5.xls
    |__6.xls
    |__7.xls

After:
·
|__file
    |__file1
        |__1.xls
        |__2.xls
        |__file1.xls (*) 
    |__file2
        |__3.xls
        |__4.xls
        |__5.xls
        |__file2.xls (*)
    |__6.xls
    |__7.xls
    |__file.xls (*)

Requirement

1. xlrd (read excel)   
2. xlwt (write excel) (optional)   
3. pandas   
4. numpy

Code

# -*- coding: utf-8 -*-
import os
import pandas as pd
import numpy as np

dd = os.getcwd() 
# 获取python文件所在的当前目录,可以用 os.path.join() 追加子目录

lists = os.listdir(dd) # 获取 dd 目录下的所有文件和文件夹

for ll in lists:
    dir=os.path.join(dd, ll) # 通过拼接获得文件或文件夹的绝对路径
    os.chdir(dir) # 转移工作路径
    print(dir) # 输出检查是否是进入了新的文件夹

    #新建列表,存放文件名(可以忽略,但是为了做的过程能心里有数,先放上)
    filename_excel = []

    #新建 dataframe 类型列表,存放每个文件数据框(每一个excel读取后存放在数据框)
    frames = []

    for root, dirs, files in os.walk(dir): 
    # os.walk(dir) 可以递归遍历当前文件夹下的所有文件
        for file in files:
            # print(os.path.join(root,file))
            filename_excel.append(os.path.join(root,file))
            # 注意用 os.path.join 而不是自己写路径,win linux macos 下的路径表示不同
            df = pd.read_excel(os.path.join(root,file)) #excel转换成DataFrame
            frames.append(df)

    #打印文件名
    print(filename_excel)   
    #合并所有数据
    result = pd.concat(frames)    

    #查看合并后的数据
    result.head()
    result.shape

    result.to_csv(dd+'\\'+ll+'.csv' ,sep=',',index = False)
    #保存合并的数据到电脑D盘的merge文件夹中,并把合并后的文件命名为a12.csv

Details

  1. os.listdir(dir) vs os.walk(dir) os.listdir(dir) 是列出dir下的所有文件和文件夹,深度为0,返回绝对路径 os.walk(dir) 是递归遍历dir下的所有文件和文件夹,对于每个文件夹返回 当前文件夹的绝对路径, 当前文件夹下的所有文件夹名称的list,当前文件夹下的所有文件的名称的list

     e.g. 
     . (/home/q2l/tewst)
     ├── 1.txt
     ├── 2.txt
     ├── file1
     │   ├── 3.txt
     │   └── 4.txt
     └── file2
         └── 5.txt
    
     >>> dd=os.getcwd()
     >>> for file in os.listdir(dd):
     >>>     print(file)
    
     1.txt
     2.txt
     file1
     file2
    
     >>> for root, dirs, files in os.walk(dir):
     >>>     print(root, dirs, files)
    
     ('/home/q2l/test', ['file1', 'file2'], ['1.txt', '2.txt'])
     ('/home/q2l/test/file1', [], ['3.txt', '4.txt'])
     ('/home/q2l/test/file2', [], ['5.txt'])

Reference

PreviousBatch RenameNextExcel Oprations

Last updated 5 years ago

Was this helpful?

csdn | excel表格合并
cnblogs | python使用os.listdir和os.walk获得文件的路径