1

I have a DataFrame analogous to this one:

import pandas

dd = pandas.DataFrame({'name' : ['foo', 'foo', 'foo', 'bar',
                                 'bar', 'bar', 'bar', 'bar'],
                       'year' : ['1900', '1903', '1904', '1900',
                                 '1901', '1902', '1903', '1904'],
                       'value' : np.arange(8)
                       })

Further along the pipeline I will need to compare foo and bar in terms of magnitudes derived from value. This is why I would like to add rows for the missing years in foo and fill them with NaN.

So the final dd should have additional rows and look like this:

   value name  year
0    0.0  foo  1900
1    NaN  foo  1901
2    NaN  foo  1902
3    0.1  foo  1903
4    0.2  foo  1904
5    0.3  bar  1900
6    0.4  bar  1901
7    0.5  bar  1902
8    0.6  bar  1903
9    0.7  bar  1904

I tried using this solution but it doesn't work in this case because I have duplicate values in the year column.

I realize I have to add rows grouping by 'name' but I couldn't see how.

What should I do?

Federico Barabas
  • 659
  • 8
  • 21

1 Answers1

1

IIUC

dd.set_index(['name','year']).value.unstack().stack(dropna=False).reset_index()
Out[983]: 
  name  year    0
0  bar  1900  3.0
1  bar  1901  4.0
2  bar  1902  5.0
3  bar  1903  6.0
4  bar  1904  7.0
5  foo  1900  0.0
6  foo  1901  NaN
7  foo  1902  NaN
8  foo  1903  1.0
9  foo  1904  2.0
BENY
  • 317,841
  • 20
  • 164
  • 234
  • 1
    @Federico when you doing the unstack , it will change reshape your df , with index=name . column = year, in this situation , when there is missing year in your original df , it will fillna with taht , after we using stack (dropna=True), if will convert back to the format as your original df – BENY Mar 04 '18 at 18:35
  • For optimization I tried using `.reset_index(inplace=True)` but I get `TypeError: Cannot reset_index inplace on a Series to create a DataFrame`. Any ideas? – Federico Barabas Mar 10 '18 at 20:28
  • @Federico using to_frame() before reset_index – BENY Mar 10 '18 at 20:37