1

I would like to replace the values in a pandas dataframe from another series based on the column names. I have the foll. dataframe:

Y2000   Y2001   Y2002    Y2003    Y2004    Item    Item Code
34        43      0      0          25     Test      Val

and I have another series:

Y2000    41403766
Y2001    45283735
Y2002    47850796
Y2003    38639101
Y2004    45226813

How do I replace the values in the first dataframe based on the values in the 2nd series?

--MORE EDITS: To recreate the proble, code and data is here: umd.box.com/s/hqd6oopj6vvp4qvpwnj8r4lm3z7as4i3

Instructions to run teh code:

To run this code:

  1. Replace data_dir in config_rotations.txt with the path to the input directory i.e. where the files are kept

  2. Replace out_dir in config_rotations.txt with whatever output path you want

  3. Run python code\crop_stats.py. The problem is in line 133 of crop_stats.py

--EDIT:

Based on @Andy's query, here's the result I want:

Y2000      Y2001   Y2002     Y2003      Y2004          Item    Item Code
41403766  45283735 47850796  38639101  45226813     Test      Val

I tried

df_a.replace(df_b)

but this does not change any value in df_a

user308827
  • 21,227
  • 87
  • 254
  • 417

1 Answers1

1

You can construct a df from the series after reshaping and overwrite the columns:

In [85]:
df1[s.index] = pd.DataFrame(columns = s.index, data = s.values.reshape(1,5))
df1

Out[85]:
      Y2000     Y2001     Y2002     Y2003     Y2004  Item Item  Code
0  41403766  45283735  47850796  38639101  45226813  Test        Val

So this uses the series index values to sub-select from the df and then constructs a df from the same series, here we have to reshape the array to make a single row df

EDIT

The reason my code above won't work on your real code is firstly when assigning you can't do this:

df.loc[(df['Country Code'] == replace_cnt) & (df['Item'] == crop)][s.index]

This is called chained indexing and raises a warning, see the docs.

So to correct this you can put the columns inside the []:

df.loc[(df['Country Code'] == replace_cnt) & (df['Item'] == crop),s.index]

Additionally pandas tries to align along index values and column names, if they don't match then you'll get NaN values so you can get around this by calling .values to get a np array which just becomes anonymous data that has no index or column labels, so long as the data shape is broadcast-able then it will do what you want:

df.loc[(df['Country Code'] == replace_cnt) & (df['Item'] == crop),s.index] = pd.DataFrame(columns=s.index, data=s.values.reshape(1, len(s.index))).values
Community
  • 1
  • 1
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • thanks @EdChum, this should work but somehow my original dataframe i.e. df1 in your answer is same as before – user308827 Sep 22 '15 at 21:10
  • specifically, I am getting this warning; C:\Anaconda64\lib\site-packages\pandas\core\frame.py:2148: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy self[k1] = value[k2] --- – user308827 Sep 22 '15 at 21:13
  • that might be because my L.H.S is as follows: df1[condition1 & condition2][s.index] = ... – user308827 Sep 22 '15 at 21:14
  • Can you post raw data and code to reproduce your error – EdChum Sep 22 '15 at 21:24
  • thanks @EdChum, code and data is here: https://umd.box.com/s/hqd6oopj6vvp4qvpwnj8r4lm3z7as4i3 – user308827 Sep 22 '15 at 21:35
  • To run it, do python code\crop_stats.py. The problem is in line 133 of crop_stats.py – user308827 Sep 22 '15 at 21:36
  • Also, updated question to add instructions to run the code. You will need to update paths in config_rotations.txt – user308827 Sep 22 '15 at 21:45
  • Can you try this: `df.loc[(df['Country Code'] == replace_cnt) & (df['Item'] == crop),s.index]` for lhs this should get rid of the warning – EdChum Sep 22 '15 at 21:48
  • thanks @EdChum, it gets rid of the warnings, but replaces all the values by NaN – user308827 Sep 22 '15 at 21:50
  • 1
    That means there is no alignment between either the row values or the column names, if the column names definitely match then try `df.loc[(df['Country Code'] == replace_cnt) & (df['Item'] == crop),s.index] = pd.DataFrame(columns=s.index, data=s.values.reshape(1, len(s.index))).values` can you edit your question with what the row/columns are for the lhs and rhs, thanks – EdChum Sep 22 '15 at 21:51
  • thanks!! my R.H.S was lacking the .values, it works great now – user308827 Sep 22 '15 at 21:53
  • 1
    The reason that works is because when you assign values, pandas will try to align the lhs and rhs index and columns so if for instance the index values don't match then `NaN` will be assigned, the same thing happens if the column names don't match. By calling the attribute `.values` you return a numpy array this has no index or column labels so it's anonymous data so as long as the shape is broadcast-able it will assign as desired – EdChum Sep 22 '15 at 21:55
  • thanks @EdChum, does this mean that I have to be careful else values might be assigned to wrong columns. Since the column names are no longer being matched? – user308827 Sep 22 '15 at 21:57
  • 1
    if the column names don't match then it won't overwrite, I answered something about broadcasting wrt to pandas [here](http://stackoverflow.com/questions/29954263/what-does-the-term-broadcasting-mean-in-pandas-documentation/29955358#29955358) which should demonnstrate what I'm talking about – EdChum Sep 22 '15 at 22:01