1

I am trying concat several csv files by customer group using the below code:

files = glob.glob(file_from + "/*.csv") <<-- Path where the csv resides
df_v0 = pd.concat([pd.read_csv(f) for f in files]) <<-- Dataframe that concat all csv files from files mentioned above

The problem is the number of column in the csv varies by customer and they do not have a header file.

I am trying to see if I could add in a dummmy header column with labels such as col_1, col_2 ... depending on the number of columns in that csv.

Could anyone guide as to how could I get this done. Thanks.

Update on trying to search for a specific string in the Dataframe:

Sample Dataframe

col_1,col_2,col_3
fruit,grape,green
fruit,watermelon,red
fruit,orange,orange
fruit,apple,red

Trying to filter out rows having the word red and expect it to return rows 2 and 4.

Tried the below code:

df[~df.apply(lambda x: x.astype(str).str.contains('red')).any(axis=1)]
dark horse
  • 3,211
  • 8
  • 19
  • 35

1 Answers1

1

Use parameters header=None for default range columns 0, 1, 2 and skiprows=1 if necessary remove original columns names:

df_v0 = pd.concat([pd.read_csv(f, header=None, skiprows=1) for f in files])

If want also change columns names add rename:

dfs = [pd.read_csv(f, header=None, skiprows=1).rename(columns = lambda x: f'col_{x + 1}') 
        for f in files]
df_v0 = pd.concat(dfs)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • one more help. files is a list that has list of filenames stored in it. I have an issue where few filename are written in upper case (eg : FILE1.CSV) and few are in small case (eg: file2.csv).. How could we make them all small case. Could you please assist on that. Thanks.. – dark horse Nov 02 '18 at 09:47
  • 1
    @darkhorse - not sure if understand, `files` return list of filenames with upper and lower names. Then looping by them and DataFrame are created. If change filenames to lowercase then errors will raise - file not exist. But if realluy need it use `pd.read_csv(f.lower(), ...` – jezrael Nov 02 '18 at 09:50
  • 1
    if I got that right, are filenames case - sensitive when read in pandas. For example if file name is FILE1.CSV and if I pass in file1.csv will it fail because they are case-sensitive. – dark horse Nov 02 '18 at 09:53
  • got another question. I am trying to search for a specific text from the entire dataframe (df_v0). Need to scan through all rows and columns. I am able to filter by a specific column but not sure how to extend this to the entire Dataframe.. – dark horse Nov 02 '18 at 11:26
  • @darkhorse - do you think substring or not? – jezrael Nov 02 '18 at 11:30
  • Basically I am trying to search for a string / integer and if any rows have that value trying to filer it out to a new Dataframe.. Hope this makes sense.. – dark horse Nov 02 '18 at 11:31
  • @darkhorse - I think need [this solution](https://stackoverflow.com/q/51168105/2901002) – jezrael Nov 02 '18 at 11:33
  • tried the reference shared but dint work. I have updated the initial message with a sample dataframe and the code I tried.. Could you please assist. – dark horse Nov 02 '18 at 11:45
  • 1
    @darkhorse - You are really close, only remove `~` like `df[df.apply(lambda x: x.astype(str).str.contains('red')).any(axis=1)]` – jezrael Nov 02 '18 at 11:45
  • Thank you for that. – dark horse Nov 02 '18 at 11:48