-3

enter image description here

In the image above, I am attempting to change the string "200,000.00" to an string "200000.00". As you can see from the image above, I successfully changed the value string but it didn't update in my data frame. Why is this the case?

I expect that when I return data the values in the data frame will be updated.

Jeb Dean
  • 1
  • 1
  • edit your question and remove the pic and paste code/errors/sample data as text – eshirvana Jul 26 '23 at 23:16
  • `data['SalaryUSD'] = data['SalaryUSD'].str.replace(',', '')` – Barmar Jul 26 '23 at 23:25
  • 1
    Please post code, data, and results as text, not screenshots ([how to format code in posts](https://stackoverflow.com/help/formatting)). [Why should I not upload images of code/data/errors?](https://meta.stackoverflow.com/questions/285551/why-should-i-not-upload-images-of-code-data-errors) http://idownvotedbecau.se/imageofcode – Barmar Jul 26 '23 at 23:25
  • Why do you expect it to change the dataframe? You only changed the `string` variable, you never stored it back into the df. – Barmar Jul 26 '23 at 23:26
  • Please provide enough code so others can better understand or reproduce the problem. – Community Jul 26 '23 at 23:40
  • Welcome to Stack Overflow. I have found multiple existing questions to explain and solve the problem for you. "Why is this the case?" Because the name `string` in the `for` loop is a name for *the string that was pulled out* from the DataFrame, not a name for that "cell" of the DataFrame (Please see the first linked duplicate), and because the string's `replace` *creates a new* string (second duplicate). The others are to explain more standard techniques for updating a DataFrame in-place. You want to let Pandas do the looping for you. – Karl Knechtel Jul 27 '23 at 00:40
  • Going forward, please read [mre], and show code examples [as text, not an image](https://meta.stackoverflow.com/questions/285551/), using proper [formatting](https://stackoverflow.com/help/formatting). – Karl Knechtel Jul 27 '23 at 00:45

2 Answers2

0

It seems like you are trying to change the data even though you only change the iteration of the data, meaning that the actual data frame is never changed. I think the only way to fix this is to keep track of where you are in the iteration and index the list at the point in which you are to change the data frame directly. You can see my solution below:

def clean_data(data, bad_values):
    for i in range(len(data["SalaryUSD"])):
        string = data["SalaryUSD"][i]
        for ele in string:
            if ele in bad_values:
                data["SalaryUSD"][i]=string.replace(ele, "")
    return data

Also, after going through the code you provided I found that the counter parameter never comes up in your code. I just wanted to make sure you put it back in if it is used in your full code.

Simon Champney
  • 151
  • 1
  • 13
0

The @Barmar approach can be extended so you can remove all the characters in your bad_value list using:

bad_value = ['\$',',', ' ', '\[']   # extend to include all 'bad' values
pattern = '|'.join(bad_value)
data['Salary'] = data['SalaryUSD'].str.replace(pattern, '', regex = True)

Note that characters that have a special meaning in regex have to be be 'escaped' by using backslash \. You can look these up. You should always using such an approach with Pandas DFs, not looping.

user19077881
  • 3,643
  • 2
  • 3
  • 14