Pandas - Adding dummy header column in csv

Question

I am trying concat several csv files by customer group using the below code:

files = glob.glob(file_from + "/*.csv") <<-- Path where the csv resides
df_v0 = pd.concat([pd.read_csv(f) for f in files]) <<-- Dataframe that concat all csv files from files mentioned above

The problem is the number of column in the csv varies by customer and they do not have a header file.

I am trying to see if I could add in a dummmy header column with labels such as col_1, col_2 ... depending on the number of columns in that csv.

Could anyone guide as to how could I get this done. Thanks.

Update on trying to search for a specific string in the Dataframe:

Sample Dataframe

col_1,col_2,col_3
fruit,grape,green
fruit,watermelon,red
fruit,orange,orange
fruit,apple,red

Trying to filter out rows having the word red and expect it to return rows 2 and 4.

Tried the below code:

df[~df.apply(lambda x: x.astype(str).str.contains('red')).any(axis=1)]

jezrael · Accepted Answer · 2018-11-02T09:36:50.130

1

Use parameters header=None for default range columns 0, 1, 2 and skiprows=1 if necessary remove original columns names:

df_v0 = pd.concat([pd.read_csv(f, header=None, skiprows=1) for f in files])

If want also change columns names add rename:

dfs = [pd.read_csv(f, header=None, skiprows=1).rename(columns = lambda x: f'col_{x + 1}') 
        for f in files]
df_v0 = pd.concat(dfs)

edited Nov 02 '18 at 09:36

answered Nov 02 '18 at 09:07

jezrael

822,522
95
1,334
1,252

one more help. files is a list that has list of filenames stored in it. I have an issue where few filename are written in upper case (eg : FILE1.CSV) and few are in small case (eg: file2.csv).. How could we make them all small case. Could you please assist on that. Thanks.. – dark horse Nov 02 '18 at 09:47
1

@darkhorse - not sure if understand, `files` return list of filenames with upper and lower names. Then looping by them and DataFrame are created. If change filenames to lowercase then errors will raise - file not exist. But if realluy need it use `pd.read_csv(f.lower(), ...` – jezrael Nov 02 '18 at 09:50
1

if I got that right, are filenames case - sensitive when read in pandas. For example if file name is FILE1.CSV and if I pass in file1.csv will it fail because they are case-sensitive. – dark horse Nov 02 '18 at 09:53
got another question. I am trying to search for a specific text from the entire dataframe (df_v0). Need to scan through all rows and columns. I am able to filter by a specific column but not sure how to extend this to the entire Dataframe.. – dark horse Nov 02 '18 at 11:26
@darkhorse - do you think substring or not? – jezrael Nov 02 '18 at 11:30
Basically I am trying to search for a string / integer and if any rows have that value trying to filer it out to a new Dataframe.. Hope this makes sense.. – dark horse Nov 02 '18 at 11:31
@darkhorse - I think need [this solution](https://stackoverflow.com/q/51168105/2901002) – jezrael Nov 02 '18 at 11:33
tried the reference shared but dint work. I have updated the initial message with a sample dataframe and the code I tried.. Could you please assist. – dark horse Nov 02 '18 at 11:45
1

@darkhorse - You are really close, only remove `~` like `df[df.apply(lambda x: x.astype(str).str.contains('red')).any(axis=1)]` – jezrael Nov 02 '18 at 11:45
Thank you for that. – dark horse Nov 02 '18 at 11:48

Pandas - Adding dummy header column in csv

1 Answers1