# Combine Excel

## Target

将当前文件目录下的每一层同级excel合并，并以该级目录命名

```
Before:
·
|__file
    |__file1
        |__1.xls
        |__2.xls
    |__file2
        |__3.xls
        |__4.xls
        |__5.xls
    |__6.xls
    |__7.xls

After:
·
|__file
    |__file1
        |__1.xls
        |__2.xls
        |__file1.xls (*) 
    |__file2
        |__3.xls
        |__4.xls
        |__5.xls
        |__file2.xls (*)
    |__6.xls
    |__7.xls
    |__file.xls (*)
```

## Requirement

```
1. xlrd (read excel)   
2. xlwt (write excel) (optional)   
3. pandas   
4. numpy
```

## Code

```python
# -*- coding: utf-8 -*-
import os
import pandas as pd
import numpy as np

dd = os.getcwd() 
# 获取python文件所在的当前目录，可以用 os.path.join() 追加子目录

lists = os.listdir(dd) # 获取 dd 目录下的所有文件和文件夹

for ll in lists:
    dir=os.path.join(dd, ll) # 通过拼接获得文件或文件夹的绝对路径
    os.chdir(dir) # 转移工作路径
    print(dir) # 输出检查是否是进入了新的文件夹

    #新建列表，存放文件名（可以忽略，但是为了做的过程能心里有数，先放上）
    filename_excel = []

    #新建 dataframe 类型列表，存放每个文件数据框（每一个excel读取后存放在数据框）
    frames = []

    for root, dirs, files in os.walk(dir): 
    # os.walk(dir) 可以递归遍历当前文件夹下的所有文件
        for file in files:
            # print(os.path.join(root,file))
            filename_excel.append(os.path.join(root,file))
            # 注意用 os.path.join 而不是自己写路径，win linux macos 下的路径表示不同
            df = pd.read_excel(os.path.join(root,file)) #excel转换成DataFrame
            frames.append(df)

    #打印文件名
    print(filename_excel)   
    #合并所有数据
    result = pd.concat(frames)    

    #查看合并后的数据
    result.head()
    result.shape

    result.to_csv(dd+'\\'+ll+'.csv' ,sep=',',index = False)
    #保存合并的数据到电脑D盘的merge文件夹中，并把合并后的文件命名为a12.csv
```

## Details

1. os.listdir(dir) vs os.walk(dir) os.listdir(dir) 是列出dir下的所有文件和文件夹，深度为0，返回绝对路径\
   os.walk(dir) 是递归遍历dir下的所有文件和文件夹，对于每个文件夹返回 当前**文件夹**的绝对路径， 当前文件夹下的所有**文件夹**名称的list，当前**文件夹**下的所有文件的名称的list

   ```python
    e.g. 
    . (/home/q2l/tewst)
    ├── 1.txt
    ├── 2.txt
    ├── file1
    │   ├── 3.txt
    │   └── 4.txt
    └── file2
        └── 5.txt

    >>> dd=os.getcwd()
    >>> for file in os.listdir(dd):
    >>>     print(file)

    1.txt
    2.txt
    file1
    file2

    >>> for root, dirs, files in os.walk(dir):
    >>>     print(root, dirs, files)

    ('/home/q2l/test', ['file1', 'file2'], ['1.txt', '2.txt'])
    ('/home/q2l/test/file1', [], ['3.txt', '4.txt'])
    ('/home/q2l/test/file2', [], ['5.txt'])
   ```

## Reference

1. [csdn | excel表格合并](https://blog.csdn.net/weixin_44606898/article/details/104168762)  &#x20;
2. [cnblogs | python使用os.listdir和os.walk获得文件的路径](https://www.cnblogs.com/jiaxin359/p/7324077.html)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://legacy.cookielau.com/archives/3-python/1-pythontools/2-combineexcel.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
