-2

I have CSVs with matching names in two folders. They are unique i.e. for each record in Folder1, there is likely to be a matching record with same name in Folder2.

I would like to combine CSVs with matching names into one workbook as Worksheets within the Workbook. For example, if there exists 220001.csv in both Folder1 and Folder2, the resultant workbook would be 220001.xlsx in Folder3. If a similar named file is missing in Folder2, then the resultant .xlsx will just be with one worksheet i.e. the record in Folder1. Folder1 is the master folder sort of.

How do I solve this in Python?

Hummer
  • 429
  • 1
  • 3
  • 16
  • `resultant .xlsx will just be with one worksheet i.e. the record in Folder1` - you think in `Folder3` will be excel sheet with one csv from `Folder1` ? Similar for `Folder2` ? – jezrael Apr 23 '21 at 05:10
  • Yes, @jezrael . resultant.xlsx will just be file in Folder1 saved in Folder3 if, there is no similar file in Folder2. Otherwise, resultant.xlsx will be a workbook with the values in the file in Folder1 in one Worksheet1 and ones in Folder2 in the worksheet2 with the resultant.xlsx workbook saved in Folder3. – Hummer Apr 23 '21 at 05:22
  • Seems duplicate of this question : https://stackoverflow.com/q/42092263/10290248 – ajayg2808 Apr 23 '21 at 05:22
  • 1
    Not exactly @ajayg2808 when it comes to finding identical names of CSVs in the two folders. The example works to just combine the csvs in the folder regardless of the names. My example is to combine only csvs with identical names. – Hummer Apr 23 '21 at 05:28
  • @Hummer You should do some effort with `os` and `os.path` library for finding same names, OR please show your code what you tried till now? – ajayg2808 Apr 23 '21 at 05:30

1 Answers1

1

Use:

import os, glob

#get filenames from both folders
files1 = glob.glob('Folder1/*.csv')
files2 = glob.glob('Folder2/*.csv')

#create DataFrames
df1 = pd.DataFrame([(*os.path.split(x), x) for x in files1])
df2 = pd.DataFrame([(*os.path.split(x), x) for x in files2])

#join together
df = df1.merge(df2, on=1, how='outer')

#for each row generate DataFrames and write to excel - if both match to 2 sheets
for x in df.itertuples():
    file = os.path.basename(x._2)
       
    m1 = pd.notna(x._3)
    m2 = pd.notna(x._5)
    if m1 and m2:
        writer = pd.ExcelWriter(os.path.join('Folder33', x._2.replace('csv','xlsx')))
        df1 = pd.read_csv(x._3)
        df2 = pd.read_csv(x._5)
        df1.to_excel(writer,sheet_name=x._1)
        df2.to_excel(writer,sheet_name=x._4)
        writer.save()
    elif m1:
        writer = pd.ExcelWriter(os.path.join('Folder33', x._2.replace('csv','xlsx')))
        df = pd.read_csv(x._3)
        df.to_excel(writer,sheet_name=x._1)
        writer.save()
        
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thank you @jezrael. Your solution works great and very well thought out from my point. Only issue is regards records that are present in Folder2 but missing in Folder1(master 1). They are still included in Folder3. Only formatting issue is with the creation of an un named column in the final files in Folder3. All in all, its a clever solution. You're a legend. – Hummer Apr 23 '21 at 06:23
  • @Hummer - So one thing - ouput folder of excels files are different? It means if match both to `Folder3`, if match `Folder1` to `Folder1`? And values if exist in `Folder2` to `Folder2` ? – jezrael Apr 23 '21 at 06:26
  • output folder is Folder3. Folder1 is the "master". If filename in Folder1 matches a file name in Folder2, then the two are combined as in your solution and the result put in Folder3. If there is a file in Folder2 that doesn't match a file in Folder1, then its just left in Folder2 as it is. If a file in Folder1, doesn't match any in Folder2, then the folder1 file is put in Folder3. Folder1 files are the main files. Hope its a bit clear. – Hummer Apr 23 '21 at 07:41
  • @Hummer - so need remove `elif` and `else` part of my solution? – jezrael Apr 23 '21 at 07:42
  • @Hummer - For me not clear if file from `Folder1` not match with file `Folder2`, whats happen? Nothing? Or create excel file in `Folder1` ? – jezrael Apr 23 '21 at 07:47
  • If file from Folder1 does not match any file in Folder2, then the file in Folder1 is the only Worksheet in the resultant.xlsx. Its the opposite for files in Folder2. If file in Folder2, does not match a file in Folder1, then its just ignored. No resultant.xlsx for such a case. A bit clear? – Hummer Apr 23 '21 at 07:52
  • @Hummer - Yes, but excel firl are written only in `Folder3` ? Or `If file from Folder1 does not match any file in Folder2, then the file in Folder1 is the only Worksheet in the resultant.xlsx.` - where is write excel? to `Folder1` or `Folder3` ? – jezrael Apr 23 '21 at 07:54
  • Yes, the files are written only in Folder3. – Hummer Apr 23 '21 at 08:15
  • 1
    Yes, its writing to Folder3. Only small issue is in Folder2. If file is in Folder2 and not in Folder1, then its also writing to Folder3. It's not supposed to write to that to Folder3. Should just ignore. – Hummer Apr 23 '21 at 08:19
  • @Hummer - Ok, `else` part was removed. – jezrael Apr 23 '21 at 08:21
  • I just checked and @jezrael , it still writes the file in Folder2 to Folder3 when its same file is not in Folder1. – Hummer Apr 23 '21 at 08:33
  • @Hummer - You are right, answer was changed. – jezrael Apr 23 '21 at 08:43
  • 1
    Thank you @jezrael. The updated answer works great. I really appreciate your input. You are a legend. – Hummer Apr 23 '21 at 09:01
  • How can I get in touch with you @jezrael? Have some different query. – Hummer Apr 26 '21 at 08:42
  • @Hummer - I think best post new question, because a bit busy. – jezrael Apr 26 '21 at 08:43
  • I came across a question that is close to what I was about to put up here https://stackoverflow.com/questions/67277481/how-can-i-loop-through-and-increment-the-rows-in-an-excel-workbook-formula-in-py. Any good way around it? – Hummer Apr 27 '21 at 08:35
  • @Hummer - I see, unfortunately no experience with `openpyxl` – jezrael Apr 27 '21 at 08:42
  • Cool and I appreciate your response. – Hummer Apr 27 '21 at 08:45