2

I have a data frame with 80 columns, for some columns data types should be integers but python sees them as float. Rather than manually changing the data types I am trying to write a loop that identifies the datatype that a column contains and changes the data type accordingly. I have tried the following options but it did not provide any results:

1) I tried to take columns as a variable and if the datatype is float convert it to integer.

for x in data1.columns:
    if isinstance(data1.columns,float):
        data1[x]=data1[x].astype('int')

2) I also tried this

for x in data1.columns:
    if x isinstance(x,float):
        data1=data1.astype(int)
    else:
        break

My general question is is it possible to change column datatypes with a loop, condition, function etc.?

Before posting a question I researched the web, most of the questions about changing individual column's datatype.

Thank you for your answers in advance.

1 Answers1

2

Use:

df = pd.DataFrame({
        'A':list('abcdef'),
         'B':[4,5.,4,5,5,4],
         'C':[7,8,9,4,2,3],
         'D':[1.8,3.3,5,7,1,0],
         'E':[5.0,3,6,9,2,4],
         'F':list('aaabbb')
})
print (df)
   A    B  C    D    E  F
0  a  4.0  7  1.8  5.0  a
1  b  5.0  8  3.3  3.0  a
2  c  4.0  9  5.0  6.0  a
3  d  5.0  4  7.0  9.0  b
4  e  5.0  2  1.0  2.0  b
5  f  4.0  3  0.0  4.0  b

Idea is filter only numeric columns first by DataFrame.select_dtypes and then get all integers and integers columns like floats (0 after decimal) by compare by converted columns to integers and filter all matched columns by DataFrame.all, filter them, create dictionary and pass to DataFrame.astype

df1 = df.select_dtypes(np.number)
d = dict.fromkeys(df1.columns[df1.eq(df1.astype(int)).all()], 'int')

df = df.astype(d)
print (df)
   A  B  C    D  E  F
0  a  4  7  1.8  5  a
1  b  5  8  3.3  3  a
2  c  4  9  5.0  6  a
3  d  5  4  7.0  9  b
4  e  5  2  1.0  2  b
5  f  4  3  0.0  4  b

Details:

print (df1.eq(df1.astype(int)))
      B     C      D     E
0  True  True  False  True
1  True  True  False  True
2  True  True   True  True
3  True  True   True  True
4  True  True   True  True
5  True  True   True  True

print (df1.columns[df1.eq(df1.astype(int)).all()])
Index(['B', 'C', 'E'], dtype='object')

print (d)
{'B': 'int', 'C': 'int', 'E': 'int'}

If want set all floats columns to integers - your loop solution:

for x in data1.columns:
    if isinstance(data1[x].iat[0],float):
        data1[x]=data1[x].astype(int)

print (data1)
   A  B  C  D  E  F
0  a  4  7  1  5  a
1  b  5  8  3  3  a
2  c  4  9  5  6  a
3  d  5  4  7  9  b
4  e  5  2  1  2  b
5  f  4  3  0  4  b

Non loop solution:

data1 = data1.astype(dict.fromkeys(data1.select_dtypes(np.floating), 'int'))
print (data1)
   A  B  C  D  E  F
0  a  4  7  1  5  a
1  b  5  8  3  3  a
2  c  4  9  5  6  a
3  d  5  4  7  9  b
4  e  5  2  1  2  b
5  f  4  3  0  4  b
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thank you for the suggestion, the data is a household-level survey that we conducted in order to measure the COVID-19's financial impact on refugee household. I thought about filtering the numeric columns but I prefer not to create another data frame, change datatype and then merge with the original data frame if it is possible to create a loop. – Alperen Açıkol Apr 17 '20 at 05:34
  • @AlperenAçıkol - Hmmm, so you need loops? What is reason? Because in pandas are really not recommended. [link](https://stackoverflow.com/a/55557758/2901002). – jezrael Apr 17 '20 at 05:40
  • 1
    @jezael It is not necessarily a need, I was trying to solve the problem with a loop because I did not want to deal with creating a dictionary for 40+ columns, but after reading the post you pointed out I decided to stick with the proposed solution. – Alperen Açıkol Apr 17 '20 at 05:52
  • sorry for the late reply, I checked the code it solved my problem thank you, man. I just have one question, you introduced iat[0] what does it do exactly? – Alperen Açıkol Apr 17 '20 at 06:53
  • @AlperenAçıkol - It get first value of column and check if float – jezrael Apr 17 '20 at 06:54