0

Lets say I have a (badly formed) csv like this:

header1, header2, header3
value1, value2, value3, value4

I'd like to load this into a dataframe. However

pd.read_csv(file_data, index = False) 

drops value4:

header1 header2 header3
value1 value2 value3

and

pd.read_csv(file_data)

leaves me with no way to differentiate if the value of the index came from value1 in the csv file, or was autoassigned by pandas.

Is there a way to have pandas just create dummy columns on the end based on the row with the maximum number of delimiters?

Maile Cupo
  • 636
  • 1
  • 10
  • 26
  • Does this answer your question? [import csv with different number of columns per row using Pandas](https://stackoverflow.com/questions/27020216/import-csv-with-different-number-of-columns-per-row-using-pandas) – Dušan Maďar May 06 '21 at 17:49
  • @Gusti Adli's response below seemed the most elegant, as at least several of the top comments on that thread imply that the user knows the maximum number of columns beforehand. – Maile Cupo May 06 '21 at 18:59

1 Answers1

1

If the problem is only the header, setting the header parameter as None will solve your problem.

pd.read_csv(file_data, header=None)

If the number of delimiters on each row is different, you need to read each line using open() function.

with open('test.csv', 'r') as f:
    df = [i.strip().split(',') for i in f.readlines()]
    
df = pd.DataFrame(df)
print(df)

Output: (I added "1,2,3,4,5,6\n" and "11,22,33\n" after the last row)

         0        1        2       3     4     5
0  header1  header2  header3          None  None
1   value1   value2   value3  value4  None  None
2        1        2        3       4     5     6
3       11       12       13    None  None  None
Gusti Adli
  • 1,225
  • 4
  • 13