0

im looking for a way to add in a column called q17 and/or q20 if they do not already exist in a csv.

I have around 40 csvs, and i want a scrip that will search each one seeing if it has the columns q17 & q20. If it does not, the script will add in the missing column(s) and leave all the row values blank

import pandas as pd
from os import listdir
from os.path import isfile, join


onlyfiles = [f for f in listdir('.') if isfile(join('.', f))]
print(onlyfiles)

#add q17 and q20 if missing from csv sheet
df = onlyfiles
if ['q17'] not in df:
        df['q17'] = ''

the script below returns the error "‘list’ objecet has no atribute columns" and im not sure why

abby
  • 35
  • 4

1 Answers1

0

Loop through the files, reading each file into a dataframe. Then check if each column exists (see How to check if a column exists in Pandas) and add it (see Add column to dataframe with constant value). If either column needed to be added, rewrite the file.

for file in onlyfiles:
    df = pd.read_csv(file)
    updated = False
    if 'q17' not in df:
        df['q17'] = ''
        updated = True
    if 'q20' not in df:
        df['q20'] = ''
        updated = True
    if updated:
        df.to_csv(file)
Barmar
  • 741,623
  • 53
  • 500
  • 612
  • thanks for helping!! i tried using the script above and got the error "UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 10-11: invalid continuation byte" I think it's referring to df = pd.read_csv(file), but im not sure whats wronng with that since i've used that line before without issue. im using python pandas if that changes anything – abby Oct 04 '22 at 01:47
  • See the `encoding` and `encoding_errors` arguments to `pandas.read_csv`. – Barmar Oct 04 '22 at 02:40