-5

So I have a Dataframe where I want to replace every value with a new string.

(Usually I would just do df["col1"] = "string", however I need to use loc before, which creates a copy and does not manipulate the series in place)

So currently I have a df like this df = pd.DataFrame({'A': ['aaaa', 'b', 'c', 'd', 'e']})

And when I replace it with the replace function

df.A.replace(".*","test", regex= True, inplace= True)

I get something like this

enter image description here

However what I want is something like this:

enter image description here

Why does it give me "test" twice. And how can I fix it?

Edit: So to show you what the actual problem was. I will give you this example, to show you the whole picture. Basically I have this two things

df = pd.DataFrame({'A': ['aaaa', 'bbbb', 'c', 'd', 'e']})
replace_list = ["aa","bb"] 

Now I want the df to replace every entry where an item in the list is present with the item in the list. So the df would look like this:

enter image description here

Hans Geber
  • 111
  • 8
  • 1
    You are matching between zero and unlimited times, resulting in two positions. One for the character a etc and one for right after that. This is because you havent specificed a starting position. If you include a start string ancor it will work well – JvdV Jul 26 '20 at 09:36
  • yeah that's probably it. But what do you mean by start string ancor? – Hans Geber Jul 26 '20 at 09:40
  • 1
    With that I meant try `^.*` – JvdV Jul 26 '20 at 09:59
  • well as you can see it does not work. In my original df I have a list of places. But many of them have incorrect names. I want to replace them with common names, if their name is part of a common one. – Hans Geber Jul 26 '20 at 09:59
  • @JvdV if you make a new answer out of that I can accept it – Hans Geber Jul 26 '20 at 10:04

2 Answers2

0

I think you can just use

df = pd.DataFrame({'A': ['aaaaa', 'b', 'c', 'd', 'e']})
# in case you want to substitute only a generic subset
df.loc[df['A'] == 'b', :] = 'test'
# in case you want the whole column
df['A'] = 'test'

Probably regex is an overkill =).

DaveR
  • 1,696
  • 18
  • 24
  • if the df contains of something like this df = pd.DataFrame({'A': ['aaaaaa', 'b', 'c', 'd', 'e']}) I will get "test" more often when I use "." for regex – Hans Geber Jul 26 '20 at 09:38
  • 1
    so you want to return 'test' for every match or you have a special pattern matching? Then give more example about your pattern and adjust your question accordingly – DaveR Jul 26 '20 at 09:49
  • I want "test" as an output for every row, no matter what was there before, correct. I will adjust the post to make it clear – Hans Geber Jul 26 '20 at 09:54
  • 1
    edited, see if it helps – DaveR Jul 26 '20 at 10:06
  • mhh not sure. The problem is, that I want this only to apply in certain rows, which is why I build a mask in my real code. Let me edit the post again, so that you see what I mean. I didn't want to make it that complicated initially. – Hans Geber Jul 26 '20 at 10:10
  • 1
    mmm I still do not get why anyone would want to use a `regex` for any string when you can just replace the column in simpler ways. – DaveR Jul 26 '20 at 14:09
  • I am sorry I couldn't explain it to you better. I now wrote a function that does the job and works a lot faster – Hans Geber Jul 26 '20 at 15:36
0

Your pattern matches multiple positions. One before a character (including a character) and one right after. You can test it here.

If you include a start string ancor it will work to match anything (even empty strings) and replace with Test

^.*
JvdV
  • 70,606
  • 8
  • 39
  • 70