2

I have 25 csv files with four columns each one and I have to compare 1 column named City in all 25 CSVs, then write the output in csv as which city is present in how many files with file name. e.g.

In excel I want New york in first 0 column first row and then in next column onward I want to get the file names where the city newyork is present. Like csv_output.csv:

New york File 1, File 2, File 3, File 5, File 25
Jakarta  File 1, File 7, File 19.

How to do it using Python, Pandas or CSV or any other option.

Currently I checked initially for two csv but that is not working too. My requirement is for 25 CSV files.

import pandas as pd
Data1 = pd.read_csv('C:/Users/File1.csv')
Data2  = pd.read_csv('C:/Users/File2.csv', usecols=['City'])
for df in [Data1, Data2]:
    df['City'] = df['City'].str.rstrip()
result = pd.merge(netscan, computer, on='City', how='outer')

result.to_csv('Report.csv', index=False)
print(result)
santiagoNublado
  • 322
  • 3
  • 11
Amaze_Rock
  • 163
  • 3
  • 16

1 Answers1

0

Here is one approach. First, collect all 25 CSV files into one data frame:

import pandas as pd
dfs = list()
for file in files:
    df = pd.read_csv(file)
    df['file'] = file
    dfs.append(df)
dfs = pd.concat(dfs)

Next, compute summary stats:

result = dfs.groupby('file')['city'].count()

Here is info on posting examples: How to make good reproducible pandas examples

jsmart
  • 2,921
  • 1
  • 6
  • 13
  • import pandas as pd import glob path = r'C:/Users/' # use your path all_files = glob.glob(path + "/*.csv") dfs = list() for file in all_files: df = pd.read_csv(file) df['file'] = file dfs.append(df) dfs = pd.concat(dfs) #print (dfs) result = dfs.groupby('City')['City'].count() print (result) I made above changes and I got result with Name: City, Length: 947002, dtype: int64 With that city coloum and in front of it I am getting count like Newyork : 1 Jakarta : 1 But my requirement is to get Newyork : File 1, File 2 Jakarta : File 19, In new CSV. – Amaze_Rock Aug 01 '20 at 08:02