0

I have the following Dataframe:

position     a_0     a_1     a_2     a_3     a_4     new_value
2             10     13                                100
3             12     16       13                       120
2             14     12                                140
4             15     11       16      16               150

I would like to create the following:

position     a_0     a_1     a_2     a_3     a_4     new_value
2             10     13      100                       100
3             12     16       13     120               120
2             14     12      140                       140
4             15     11       16      16     150       150

Essentially, set each row at index position to be equal to new_value. Ideally without using a for loop.

The difficulty is referring to a different column to set a value for each row. The only idea I've had is to break up the original dataframe into smaller dataframes (based on the value of position) and then just use the apply function.

Any other ideas would be super helpful!

Thanks

alwayscurious
  • 1,155
  • 1
  • 8
  • 18
  • 1
    Please add your data as text, not images. – Quang Hoang Nov 02 '20 at 04:51
  • not sure how with keeping the tabular format? – alwayscurious Nov 02 '20 at 04:52
  • Yes it can at least in your case, do `print(df)` and copy/paste. You can also do `print(df.to_dict())`... – Quang Hoang Nov 02 '20 at 04:53
  • This isn't the real data. It's a mock example. Unable to share the data. I hope that's still fine? – alwayscurious Nov 02 '20 at 04:54
  • SO is generally [against image data](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-errors-when-asking-a-question). Specially for [Pandas related question](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). You don't need to share your real data. What you have there should be good, even in csv form. – Quang Hoang Nov 02 '20 at 04:57

1 Answers1

0

From you data it is unclear that the blank cells are np.nan or empty string '', also what data types are. A print(df.to_dict()) could have been better. That said, let's assume that those are empty strings '':

# only work on interested column
s = df.loc[:,'a_0':'a_4']

# use `s.isna()` if they are `None` or `NaN`
df.loc[:, 'a_0':'a_4'] = np.where(s.eq('').cumsum(1).eq(1), 
                                  df['new_val'].values[:,None],s)

Output:

   position  a_0  a_1    a_2  a_3  a_4  new_val
0         2   10   13  100.0                100
1         3   12   16   13.0  120           120
2         2   14   12  140.0                140
3         4   15   11   16.0   16  150      150
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • amazing, could you perhaps explain the mechanics of what's happening in the code? specifically in the np.where statement – alwayscurious Nov 02 '20 at 05:32
  • 1
    `s.eq('').cumsum(1).eq(1)` mask the first `NaN` occurs in a row, print it out to see details. `np.where` check the conditions, if `True`, choose the first one, else second. – Quang Hoang Nov 02 '20 at 05:34