1

I have multiple CSV files in the same directory that contain different survey response data (different questions, different order of questions). What I'm looking to achieve is a loop through all of the CSVs to find specific column headings and store the results in a pandas dataframe.

What I have so far:

import pandas as pd
import csv
import os
import glob

path = "file/path/"
all_files = glob.glob(os.path.join(path, "*.csv")) #make list of paths

for file in all_files:
    file_name = os.path.splitext(os.path.basename(file))
    dfn = pd.read_csv(file, encoding='latin1')
    dfn.index.name = file_name

So the code currently reads in all of the CSVs from the directory, now I think I need another loop to go through them to find the data within a column. The column in question I'm looking for contains the text "would recommend" (there's a possibility not all column names will be worded the same so would need to be contains). I am quite new to Python and really struggling, any help is greatly appreciated.

Example of CSV1:

Programme,"Overall, I am satisfied with the quality of the programme",I would recommend the company to a friend or colleague,Please comment on any positive aspects of your experience of this programme
Nursing,4,4,[IMAGE]
Nursing,1,3,very good
Nursing,4,5,I enjoyed studying tis programme

Example of CSV2:

Programme,I would recommend the company to a friend,The programme was well organised and running smoothly,It is clear how students' feedback on the programme has been acted on
IT,4,2,4
IT,5,5,5
IT,5,4,5
willd9
  • 91
  • 1
  • 11

2 Answers2

1

I'd change the name of the columns to a common value, then concat them together, uing the join parameter to specify that you only want the common columns.

import pandas as pd
from io import StringIO

csv1 = StringIO("""Programme,"Overall, I am satisfied with the quality of the programme",I would recommend the company to a friend or colleague,Please comment on any positive aspects of your experience of this programme
Nursing,4,4,IMAGE
Nursing,1,3,very good
Nursing,4,5,I enjoyed studying tis programme""")

csv2 = StringIO("""Programme,I would recommend the company to a friend,The programme was well organised and running smoothly,It is clear how students' feedback on the programme has been acted on
IT,4,2,4
IT,5,5,5
IT,5,4,5""")

dfout = pd.DataFrame(columns=['Programme', 'Recommends'])

for file in [csv1, csv2]:
    dfn = pd.read_csv(file)
    matching = [s for s in dfn.columns if "would recommend" in s]
    if matching:
        dfn = dfn.rename(columns={matching[0]:'Recommends'})
        dfout = pd.concat([dfout, dfn], join="inner")

print(dfout)
Programme Recommends
0   Nursing          4
1   Nursing          3
2   Nursing          5
0        IT          4
1        IT          5
2        IT          5
rcriii
  • 687
  • 6
  • 9
  • This works for the example you've given, but when I attempt to read in a CSV and run the loop through it, I get an error message: ValueError(msg.format(_type=type(filepath_or_buffer))) ValueError: Invalid file path or buffer object type: – willd9 Oct 01 '20 at 08:29
  • Here is the code I attempted: df = pd.read_csv('data.csv') dfout = pd.DataFrame(columns=['Subunit', 'Recommends']) for file in [df]: dfn = pd.read_csv(file) matching = [s for s in dfn.columns if "would recommend" in s] if matching: dfn = dfn.rename(columns={matching[0]:'Recommends'}) dfout = pd.concat([dfout, dfn], join="inner") print(dfout) – willd9 Oct 01 '20 at 08:30
0

You do not need to loop:

matching = [s for s in dfn.columns if "would recommend" in s]

in 'matching' you will find the names of the columns which match your criteria

gtomer
  • 5,643
  • 1
  • 10
  • 21