1

So I have a data frame where the headings I want do not currently line up:

    In [1]: df = pd.read_excel('example.xlsx')
            print (df.head(10))

    Out [1]:                                 Portfolio  Asset        Country   Quantity  
         Unique Identifier Number of fund       B24     B65             B35      B44   
          456               2                General  Type A  UNITED KINGDOM        1   
          123               3                General  Type B              US        2   
          789               2                General  Type C  UNITED KINGDOM        4   
          4852              4                General  Type C  UNITED KINGDOM        4   
          654               1                General  Type A          FRANCE        3   
          987               5                General  Type B  UNITED KINGDOM        2   
          321               1                General  Type B         GERMANY        1   
          951               3                General  Type A  UNITED KINGDOM        2   
          357               4                General  Type C  UNITED KINGDOM        3   

As we can see; above the first 2 column headings there are 2 blank cells and below the next 4 column headings are "B" numbers which I don't care about.

So 2 questions; How can I shift up the first 2 columns without having a column heading to identify them with (due to the blank cells above)?

And how can I delete just Row 2 of the remaining columns and have the data below move up to take the place of the "B" numbers?

I found some similar questions already asked python: shift column in pandas dataframe up by one but nothing that solves the particular intricacies above I don't think.

Also I'm quite new to Python and Pandas so if this is really basic I apologise!

Community
  • 1
  • 1
yenoolnairb
  • 1,493
  • 2
  • 10
  • 12
  • It looks like it's read the first 2 cols as indices, call `df.reset_index()` to restore them as columns – EdChum Mar 01 '16 at 10:18
  • What is `df.columns` ? – jezrael Mar 01 '16 at 10:21
  • Ok so after doing doing `df.reset_index()` then `df.columns` they now have the titles; "Unnamed: 0" & "Unnamed: 1" which gives me something to call those 2 by to shift them up. Cheers! – yenoolnairb Mar 01 '16 at 10:26

1 Answers1

2

IIUC you can use:

#create df from multiindex in columns
df1 = pd.DataFrame([x for x in df.columns.values])
print df1
           0                  1
0             Unique Identifier
1                Number of fund
2  Portfolio                B24
3      Asset                B65
4    Country                B35
5   Quantity                B44

#if len of string < 4, give value from column 0 to column 1
df1.loc[df1.iloc[:,1].str.len() < 4, 1] = df1.iloc[:,0]
print df1
           0                  1
0             Unique Identifier
1                Number of fund
2  Portfolio          Portfolio
3      Asset              Asset
4    Country            Country
5   Quantity           Quantity

#set columns by first columns of df1
df.columns = df1.iloc[:,1]
print df
0  Unique Identifier  Number of fund Portfolio   Asset         Country  \
0                456               2   General  Type A  UNITED KINGDOM   
1                123               3   General  Type B              US   
2                789               2   General  Type C  UNITED KINGDOM   
3               4852               4   General  Type C  UNITED KINGDOM   
4                654               1   General  Type A          FRANCE   
5                987               5   General  Type B  UNITED KINGDOM   
6                321               1   General  Type B         GERMANY   
7                951               3   General  Type A  UNITED KINGDOM   
8                357               4   General  Type C  UNITED KINGDOM   

0  Quantity  
0         1  
1         2  
2         4  
3         4  
4         3  
5         2  
6         1  
7         2  
8         3  

EDIT by comments:

print df.columns
Index([u'Portfolio', u'Asset', u'Country', u'Quantity'], dtype='object')

#set first row by columns names
df.iloc[0,:] = df.columns

#reset_index
df = df.reset_index()
#set columns from first row
df.columns = df.iloc[0,:]
df.columns.name= None
#remove first row
print df.iloc[1:,:]
  Unique Identifier Number of fund Portfolio   Asset         Country Quantity
1               456              2   General  Type A  UNITED KINGDOM        1
2               123              3   General  Type B              US        2
3               789              2   General  Type C  UNITED KINGDOM        4
4              4852              4   General  Type C  UNITED KINGDOM        4
5               654              1   General  Type A          FRANCE        3
6               987              5   General  Type B  UNITED KINGDOM        2
7               321              1   General  Type B         GERMANY        1
8               951              3   General  Type A  UNITED KINGDOM        2
9               357              4   General  Type C  UNITED KINGDOM        3
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • This looks like exactly what I want to do however it's throwing up some errors. First it gave me `IndexError: single positional indexer is out-of-bounds` which I've manage to fix but now I get `AttributeError: 'DataFrame' object has no attribute 'str'` and this I do not know how to fix.... – yenoolnairb Mar 01 '16 at 10:43
  • `df1` after the first line only has the column labelled `0` in your example; it doesn't have the column labelled `1` And then the next line throws up the 'str' error – yenoolnairb Mar 01 '16 at 10:57
  • And what is `print df1.dtypes` ? – jezrael Mar 01 '16 at 10:59
  • `0 object` `dtype: object` – yenoolnairb Mar 01 '16 at 11:02