5

I have a dataframe in pandas with four columns. The data consists of strings. Sample:

          A                  B                C      D
0         2          asicdsada          v:cVccv      u
1         4     ascccaiiidncll     v:cVccv:ccvc      u
2         9                sca              V:c      u
3        11               lkss             v:cv      u
4        13              lcoao            v:ccv      u
5        14           wuduakkk         V:ccvcv:      u

I want to replace the string 'u' in Col D with the string 'a' if Col C in that row contains the substring 'V' (case sensitive). Desired outcome:

          A                  B                C      D
0         2          asicdsada          v:cVccv      a
1         4     ascccaiiidncll     v:cVccv:ccvc      a
2         9                sca              V:c      a
3        11               lkss             v:cv      u
4        13              lcoao            v:ccv      u
5        14           wuduakkk         V:ccvcv:      a

I prefer to overwrite the value already in Column D, rather than assign two different values, because I'd like to selectively overwrite some of these values again later, under different conditions.

It seems like this should have a simple solution, but I cannot figure it out, and haven't been able to find a fully applicable solution in other answered questions.

df.ix[1]["D"] = "a"

changes an individual value.

df.ix[:]["C"].str.contains("V")

returns a series of booleans, but I am not sure what to do with it. I have tried many many combinations of .loc, apply, contains, re.search, and for loops, and I get either errors or replace every value in column D. I'm a novice with pandas/python so it's hard to know whether my syntax, methods, or conceptualization of what I even need to do are off (probably all of the above).

largercat
  • 53
  • 1
  • 1
  • 4

1 Answers1

6

As you've already tried, use str.contains to get a boolean Series, and then use .loc to say "change these rows and the D column". For example:

In [5]: df.loc[df["C"].str.contains("V"), "D"] = "a"

In [6]: df
Out[6]: 
    A               B             C  D
0   2       asicdsada       v:cVccv  a
1   4  ascccaiiidncll  v:cVccv:ccvc  a
2   9             sca           V:c  a
3  11            lkss          v:cv  u
4  13           lcoao         v:ccv  u
5  14        wuduakkk      V:ccvcv:  a

(Avoid using .ix -- it's officially deprecated now.)

DSM
  • 342,061
  • 65
  • 592
  • 494