1

I want to read a csv as dataframe into Pandas.

My csv file has the following format

a b c d
0 1 2 3 4 5
1 2 3 4 5 6

When I read the csv with Pandas I get the following dataframe

    a b c d
0 1 2 3 4 5
1 2 3 4 5 6

When I execute print df.columns I get something like :

Index([u'a', u'b', u'c', u'd'], dtype='object')

And when I execute print df.iloc[0] I get :

a  2
b  3
c  4
d  5
Name: (0, 1)

I would like to have something a dataframe like

a b c d col1 col2
0 1 2 3 4    5
1 2 3 4 5    6

I don't know how many columns I will have to had. But I need as many columns as the number of value in the first line after the header. How can I achieve that ?

  • This [answer](https://stackoverflow.com/questions/34358196/read-csv-with-missing-incomplete-header-or-irregular-number-of-columns) could help – floatingpurr Sep 14 '17 at 16:00

1 Answers1

3

One way to do this would be to read in the data twice. Once with the first row (the original columns) skipped and the second with only the column names read (and all the rows skipped)

df = pd.read_csv(header=None, skiprows=1)
columns = pd.read_csv(nrows=0).columns.tolist()
columns

Output

['a', 'b', 'c', 'd']

Now find number of missing columns and use a list comprehension to make new columns

num_missing_cols = len(df.columns) - len(columns)
new_cols = ['col' + str(i+1) for i in range(num_missing_cols)]
df.columns = columns + new_cols
df

   a  b  c  d  col1  col2
0  0  1  2  3     4     5
1  1  2  3  4     5     6
Ted Petrou
  • 59,042
  • 19
  • 131
  • 136