Conditional dataframe manipulation by row

Question

Say I have a df like

and I want a df like this

How would I do this in python or R? This would be so easy in excel with a simple if statement, for example: c5 =IF(c2 = "X", "ccc", c4).

I thought this would be simple in R too, but I tried df <- df %>% mutate(c4 = ifelse(c2 = 'X', paste(c3, c3, c3), c4)), and it fills all the other values with NA's:

Why is this happening and how would I fix it?

Ideally though, I'd like to do this in python. I've tried dfply's mutate and ifelse similarly to the above, and using pandas loc function, but neither have worked.

This feels like it should be really simple - is there something obvious that I'm missing?

akrun · Answer 1 · 2022-12-27T19:29:23.290

0

We may need strrep in R

library(dplyr)
df %>%
   mutate(c4 = ifelse(c2 %in% "X", strrep(c3, nchar(c4)), c4))

-output

  id c2 c3  c4
1  1     a aaa
2  2     b bbb
3  3  X  c ccc

data

df <- structure(list(id = 1:3, c2 = c("", "", "X"), c3 = c("a", "b", 
"c"), c4 = c("aaa", "bbb", "zzz")), class = "data.frame", row.names = c(NA, 
-3L))

edited Dec 27 '22 at 19:29

answered Dec 27 '22 at 19:19

akrun

874,273
37
540
662

Ok, that handles the repeating string piece (thank you!), but not the rest. It is still yielding NA's for me, which I don't understand. I'm not getting the same output you are. – user276238 Dec 27 '22 at 19:26
@user276238 I am assuming that you have `NA` in your c2 column instead of blanks (`""`). Just change the `==` to `%in%` and it should work. Please find the update – akrun Dec 27 '22 at 19:29

Mustafa Aydın · Answer 2 · 2022-12-27T19:41:17.840

df.c4.where(df.c2.ne("X"), other=df.c3 * 3)

This reads as

"for c4 column: where the c2 values are not equal to "X", keep them as is; otherwise, put the 3-times repeated c3 values".

Example run:

In [182]: df
Out[182]:
   id c2 c3   c4
0   1     a  aaa
1   2     b  bbb
2   3  X  c  zzz

In [183]: df.c4 = df.c4.where(df.c2.ne("X"), other=df.c3 * 3)

In [184]: df
Out[184]:
   id c2 c3   c4
0   1     a  aaa
1   2     b  bbb
2   3  X  c  ccc

SomeDude · Answer 3 · 2022-12-27T20:32:06.637

0

I think you can just do in pandas:

m = df['c2'] == 'X'
df.loc[m, 'c4'] = df.loc[m, 'c3'].str.repeat(3)

Look for rows whose 'c2' is 'X' and locate 'c3' column , repeat it 3 times and modify the 'c4' column inplace with .loc

edited Dec 27 '22 at 20:32

answered Dec 27 '22 at 20:24

SomeDude

13,876
5
21
44

Conditional dataframe manipulation by row

3 Answers3

data