0

I am a beginner in python and I have a large array to process and one of the columns (loan_status) has all entries as a characters (not numbers) and i would like to change them into numbers. There are different type of entries but basically i am only interested in "fully paid" and "current" and i would like to change them into 1 and all other entries to 0.

1 import numpy as np
2 import pandas as pd
3
4 data_file = pd.read_csv('loan.csv')
5 loan_stat = data_file.loan_status
6 for i in range(len(loan_stat)):
7    if loan_stat[i]=='Fully Paid':
8        loan_stat[i]=1
9    elif loan_stat[i]=='Current':
10        loan_stat[i]=1
11    else:
12        loan_stat[i]=0
13
14 print(loan_stat)

i get such error when i execute " value is trying to be set on a copy of a slice from a DataFrame". the error refers to lines 8,10,12.

Thank you very much for the help

Jon Clements
  • 138,671
  • 33
  • 247
  • 280
Blazej Kowalski
  • 367
  • 1
  • 6
  • 16

2 Answers2

0

The easiest way to do this, implementing an if-else on a pandas series, is probably using np.where:

5 data_file['loan_status'] = np.where(data_file['loan_status'].isin(['Fully Paid', 'Current']), 1, 0)
6 print(data_file['loan_status'])

Note that this excludes the assignment

loan_stat = data_file.loan_status

operating on the assumption that you want to modify the column data_file['loan_status'] in the dataframe.

If you instead want to get a separate series with a 'Fully Paid'/'Current' indicator variable just while avoiding the SettingWithCopyWarning (which is elaborated on in @Parth Chaudhary's excellent link), then

5 loan_stat = np.where(data_file['loan_status'].isin(['Fully Paid', 'Current']), 1, 0)

would do it.

Alternatively, just replacing your line #5 with

5 loan_stat = data_file.loan_status.copy()

will also avoid the issue that triggers the warning, though I wouldn't recommend it, if only because looping over a pandas series/dataframe or a numpy array when you have other options is usually a lot slower.

EFT
  • 2,359
  • 1
  • 10
  • 11
  • Thank you!! I used your np.where solution and it works perfectly although I still don't quite get why I got the error in my code in the first place. I will try to understand that link from @Parth Chaudhary. Thank you one more time – Blazej Kowalski May 26 '17 at 15:34
0

You can create another list for storing 0's and 1's

import numpy as np
import pandas as pd

data_file = pd.read_csv('loan.csv')
loan_stat = data_file.loan_status
loan_n=[]
for i in range(len(loan_stat)):
   if loan_stat[i]=='Fully Paid':
       #loan_stat[i]=1
       loan_n.append(1)
   elif loan_stat[i]=='Current':
        #loan_stat[i]=1
        loan_n.append(1)
   else:
        #loan_stat[i]=0
        loan_n.append(0)

print(loan_n)