0

I have a Pandas DataFrame named df and in df['salary'] column, there are 400 values represented by same number -999. I want to replace that -999 value with any number in between 200 and 500. I want to replace all 400 values with a different number from 200 to 500. So far I have written this code:

df['salary'] = df['salary'].replace(-999, random.randint(200, 500))

but this code is replacing all -999 with the same value. I want all replaced values to be different from each other. How can do this.

Sticky
  • 151
  • 1
  • 10

1 Answers1

0

You can use Series.mask with np.random.randint:

df = pd.DataFrame({"salary":[0,1,2,3,4,5,-999,-999,-999,1,3,5,-999]})

df['salary'] = df["salary"].mask(df["salary"].eq(-999), np.random.randint(200, 500, size=len(df)))

print (df)

    salary
0        0
1        1
2        2
3        3
4        4
5        5
6      413
7      497
8      234
9        1
10       3
11       5
12     341

If you want non-repeating numbers instead:

s = pd.Series(range(200, 500)).sample(frac=1).reset_index(drop=True)

df['salary'] = df["salary"].mask(df["salary"].eq(-999), s)
Henry Yik
  • 22,275
  • 4
  • 18
  • 40