0

For the following screenshot, I want to change the NaN values under the total_claim_count_ge65 to a 5 if the values of the ge65_suppress_flag have the # symbol.

screenshot

I want to use a for loop to go through the ge65_suppress_flag column and every time it encounters a # symbol, it will change the NaN value in the very next column (total_claim_count_ge65) to a 5.

Jérôme Richard
  • 41,678
  • 6
  • 29
  • 59
Eric Blair
  • 13
  • 2
  • Related question: https://stackoverflow.com/questions/41985063/cannot-convert-nan-to-int-but-there-are-no-nans did it help you? – Donka May 05 '20 at 20:59

4 Answers4

0

Try something like:

df[df['ge65_suppress_flag'] == '#']['total_claim_count_ge65'].fillna(5, inplace=True)
nik7
  • 806
  • 3
  • 12
  • 20
0

Creating a similar data frame

import pandas
df1 = pd.DataFrame({"ge65_suppress_flag": ['bla', 'bla', '#', 'bla'], "total_claim_count_ge65": [1.0, 2.0, None, 4.0]})

Filling in 5.0 in rows where ge65_suppress_flag column value equals to '#'

df1.loc[df1['ge65_suppress_flag']=="#", 'total_claim_count_ge65'] = 5.0
PermanentPon
  • 702
  • 5
  • 10
  • Thanks Permanent Pon. This worked pretty well. What I had for the screen shot was a fraction of the data I am looking at and I used your idea with a small modification. I turned the 2 columns into lists and I used them in the data frame method to do the entirety of the columns which consist of about 1 millions rows. I wonder if there is a way to do this without having to create a new data frame. – Eric Blair May 06 '20 at 18:12
0

I'll explain step-by-step after giving you a solution.

Here's a one-liner that will work.

df[df[0]=='#'] = df[df[0]=='#'].fillna(5)

To make the solution more general, I used the column's index based on your screenshot. You can change the index number, or specify by name like so:

df['name_of_column']

Step-by-step explanation:

First, you want to use the variable attributes in your first column df[0] to select only those equal to string '#':

df[df[0]=='#']

Next, use the pandas fillna method to replace all variable attributes that are np.NaN with 5:

df[df[0]=='#'].fillna(5)

According to the fillna documentation, this function returns a new dataframe. So, to avoid this, you want to set the subsection of your dataframe to what is returned by the function:

df[df[0]=='#'] = df[df[0]=='#'].fillna(5)
nik7
  • 806
  • 3
  • 12
  • 20
0

Using df.apply with a lambda:

d = {'ge65_suppress_flag': ['not_supressed','not_supressed','#'], 'total_claim_count_ge65': [516.03, 881.0, np.nan]}
df = pd.DataFrame(data=d)
df['total_claim_count_ge65'] = df.apply(lambda x: 5 if x['ge65_suppress_flag']=='#' else x['total_claim_count_ge65'], axis=1)
print(df)

prints:

  ge65_suppress_flag  total_claim_count_ge65
0      not_supressed                  516.03
1      not_supressed                  881.00
2                  #                    5.00
pink spikyhairman
  • 2,391
  • 1
  • 16
  • 13
  • Thank-you pink spikyhairman. This worked really well. The screenshot came from a data set that has about 1 million columns. I turned the columns into lists and applied your code which was able to do all of the rows. I wonder if there is a way to do this without having to create a new data frame. Say these two columns were 2 of 70 columns in a big data frame You would have to create this new data frame and then go back into the main bigger data frame and merge these 2 new columns and delete the 2 old columns. – Eric Blair May 06 '20 at 18:07
  • if you have the original 70 columns of data in a dataframe ```df```, with the column names the same, then the line ```df['total_claim_count_ge65'] = df.apply(lambda x: 5 if x['ge65_suppress_flag']=='#' else x['total_claim_count_ge65'], axis=1)``` of my solution will work on that dataframe as-is. – pink spikyhairman May 06 '20 at 18:20