1

I am going to clean the data in a 4 * 4 dataframe in Python, with elements 'a' and '?' in it. I want to replace '?' by NA.

In R, I write:

for (i in 1:4){
    DATA[DATA[,i]=='?',i]=NA}

When I have tried to write in Python:

for i in range(3):
    DATA[DATA.iloc[:,i]=='?'].iloc[:,i]=np.nan

I cannot change anything when I use Python. How should I write the command? Thanks.

Danny
  • 21
  • 1
  • in R: `Data[Data=="?"] <- NA`. You don't need a for loop. `.replace` in python is the way to go as outlined below. Generally, when a vectorized solution is available, loops are not advised. – M-- May 01 '19 at 15:21

2 Answers2

1

In python you can directly use DATA.replace({'?':None})

yatu
  • 86,083
  • 12
  • 84
  • 139
aseem bhartiya
  • 94
  • 1
  • 10
  • May I further ask what is the meaning of the brackets? I mean the function of each of them. I am not familiar with this writing method. Thanks. – Danny May 02 '19 at 00:25
  • DATA-> this is your dataframe object. .replace() -> is a pandas dataframe class function DATA.replace()-> you are calling this function for DATA object {'?': None}-> this is a dictionary notation mapping '?' to None. so in essence DATA.replace({'?':None} is request to replace '?' with None in DATA object – aseem bhartiya May 02 '19 at 13:48
0

In R, we can do this without a loop as well

DATA[1:4][DATA[1:4] == '?'] <- NA
akrun
  • 874,273
  • 37
  • 540
  • 662
  • @Danny The `[]` can take a row index, column index separated by `,` i.e. `df1[2:4, 2]` gives the rows 2 to 4 from the 2nd column or similarly `df1[1:3, 2:5]` gives a subset of dataset with columns 2 to 5 and rows `1:3`. Using index without a `,` by default will be evaluated as column index. so, here `DATA[1:4]` is selecting the first four columns – akrun May 02 '19 at 00:26
  • But what about the double bracket after DATA[1:4]? I mean the used to be the criteria. – Danny May 02 '19 at 00:32
  • @Danny Here, `DATA[1:4]` is a subset of columns, then we do a comparison (`==`) to get a logical matrix (`DATA[1:4] == '?'` This logical matrix (TRUE values) are used to subset the `DATA[1:4]` i.e. `DATA[1:4][logicalmatrix]` and set those elements to `NA` which corresponds to the TRUE – akrun May 02 '19 at 00:34