4

The following is example of data I have in excel sheet.

A    B   C 
1    2   3 
4    5   6

I am trying to get the columns name using the following code:

p1 = list(df1t.columns.values)

the output is like this

[A, B, C, 'Unnamed: 3', 'unnamed 4', 'unnamed 5', .....] 

I checked the excel sheet, there is only three columns named A, B, and C. Other columns are blank. Any suggestion?

Mary
  • 1,142
  • 1
  • 16
  • 37

2 Answers2

4

Just in case anybody stumbles over this problem: The issue can also arise if the excel sheet contains empty cells that are formatted with a background color:

Screenshot of Excel File

import pandas as pd
df1t = pd.read_excel('test.xlsx')
print(df1t)
   A  B  C  Unnamed: 3
0  1  2  3         NaN
1  4  5  6         NaN

One option is to drop the 'Unnamed' columns as described here:
https://stackoverflow.com/a/44272830/11826257

df1t = df1t[df1t.columns.drop(list(df1t.filter(regex='Unnamed:')))]
print(df1t)
   A  B  C
0  1  2  3
1  4  5  6
Snoeren01
  • 343
  • 3
  • 12
3

There is problem some cells are not empty but contains some whitespaces.

If need columns names with filtering Unnamed:

cols = [col for col in df if not col.startswith('Unnamed:')]
print (cols)
['A', 'B', 'C']

Sample with file:

df = pd.read_excel('https://dl.dropboxusercontent.com/u/84444599/file_unnamed_cols.xlsx')
print (df)
     A    B    C Unnamed: 3 Unnamed: 4 Unnamed: 5 Unnamed: 6 Unnamed: 7
0  4.0  6.0  8.0        NaN        NaN        NaN        NaN        NaN
1  NaN  NaN  NaN                   NaN        NaN        NaN        NaN
2  NaN  NaN  NaN        NaN                   NaN        NaN           
3  NaN  NaN  NaN        NaN        NaN                              NaN

cols = [col for col in df if not col.startswith('Unnamed:')]
print (cols)
['A', 'B', 'C']

Another solution:

cols = df.columns[~df.columns.str.startswith('Unnamed:')]
print (cols)
Index(['A', 'B', 'C'], dtype='object')

And for return all columns by cols use:

print (df[cols])
     A    B    C
0  4.0  6.0  8.0
1  NaN  NaN  NaN
2  NaN  NaN  NaN
3  NaN  NaN  NaN

And if necessary remove all NaNs rows:

print (df[cols].dropna(how='all'))
     A    B    C
0  4.0  6.0  8.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • thank you. I am going to merge two excel sheet with the same column names. So for the sheet having "unnamed" columns, I need to drop the "unnamed" column first. Shall I change the code to cols = [col for col in df if col.startswith('Unnamed:')] and then drop it ? – Mary Jun 24 '17 at 09:17
  • I think drop is not necessary, rather select only columns not unnamed like `cols = [col for col in df if col.startswith('Unnamed:')] ` and then `pd.merge(df1, df2[cols])` – jezrael Jun 24 '17 at 09:19
  • Glad can help you! – jezrael Jun 24 '17 at 09:23
  • Would you please answer this question:https://stackoverflow.com/questions/44741136/how-to-remove-a-row-a-specific-row-with-null-value – Mary Jun 24 '17 at 22:30