Replace every value in a pandas dataframe series

Question

So I have a Dataframe where I want to replace every value with a new string.

(Usually I would just do df["col1"] = "string", however I need to use loc before, which creates a copy and does not manipulate the series in place)

So currently I have a df like this df = pd.DataFrame({'A': ['aaaa', 'b', 'c', 'd', 'e']})

And when I replace it with the replace function

df.A.replace(".*","test", regex= True, inplace= True)

I get something like this

However what I want is something like this:

Why does it give me "test" twice. And how can I fix it?

Edit: So to show you what the actual problem was. I will give you this example, to show you the whole picture. Basically I have this two things

df = pd.DataFrame({'A': ['aaaa', 'bbbb', 'c', 'd', 'e']})
replace_list = ["aa","bb"]

Now I want the df to replace every entry where an item in the list is present with the item in the list. So the df would look like this:

You are matching between zero and unlimited times, resulting in two positions. One for the character a etc and one for right after that. This is because you havent specificed a starting position. If you include a start string ancor it will work well — JvdV, Jul 26 '20 at 09:36
yeah that's probably it. But what do you mean by start string ancor? — Hans Geber, Jul 26 '20 at 09:40
well as you can see it does not work. In my original df I have a list of places. But many of them have incorrect names. I want to replace them with common names, if their name is part of a common one. — Hans Geber, Jul 26 '20 at 09:59

DaveR · Answer 1 · 2020-07-26T10:06:13.883

0

I think you can just use

df = pd.DataFrame({'A': ['aaaaa', 'b', 'c', 'd', 'e']})
# in case you want to substitute only a generic subset
df.loc[df['A'] == 'b', :] = 'test'
# in case you want the whole column
df['A'] = 'test'

Probably regex is an overkill =).

edited Jul 26 '20 at 10:06

answered Jul 26 '20 at 09:36

DaveR

1,696
18
24

if the df contains of something like this df = pd.DataFrame({'A': ['aaaaaa', 'b', 'c', 'd', 'e']}) I will get "test" more often when I use "." for regex – Hans Geber Jul 26 '20 at 09:38
1

so you want to return 'test' for every match or you have a special pattern matching? Then give more example about your pattern and adjust your question accordingly – DaveR Jul 26 '20 at 09:49
I want "test" as an output for every row, no matter what was there before, correct. I will adjust the post to make it clear – Hans Geber Jul 26 '20 at 09:54
1

edited, see if it helps – DaveR Jul 26 '20 at 10:06
mhh not sure. The problem is, that I want this only to apply in certain rows, which is why I build a mask in my real code. Let me edit the post again, so that you see what I mean. I didn't want to make it that complicated initially. – Hans Geber Jul 26 '20 at 10:10
1

mmm I still do not get why anyone would want to use a `regex` for any string when you can just replace the column in simpler ways. – DaveR Jul 26 '20 at 14:09
I am sorry I couldn't explain it to you better. I now wrote a function that does the job and works a lot faster – Hans Geber Jul 26 '20 at 15:36

score 0 · Accepted Answer · answered Jul 26 '20 at 10:14

0

Your pattern matches multiple positions. One before a character (including a character) and one right after. You can test it here.

If you include a start string ancor it will work to match anything (even empty strings) and replace with Test

^.*

answered Jul 26 '20 at 10:14

JvdV

70,606
8
39
70

Replace every value in a pandas dataframe series

2 Answers2