Convert a subset of pandas columns to int

Question

I have a dataframe with a bunch of int columns plus four additional columns. I melt the dataframe. It works as expected. I then pivot-table it back. This also works fine. The only issue is that the integer columns are all converted to float64 from the combined melt\pivot_table operations. NOTE: Every single value in the affected columns is simply a zero (0) or a one (1). I end up with 1.0 or 0.0. I want to convert them back to int.

This is the code block with the issue.

exclude = ['Title', 'Votes', 'Rating', 'Revenue_Millions']
for col in re_reshaped_df.columns:
    if ~col.isin(exclude):
        re_reshaped_df[col] = re_reshaped_df[col].astype('int')

But I am getting this: AttributeError: 'str' object has no attribute 'isin'

The goal is to convert all columns NOT in the 'exclude' list above to int.

I was following this post: How to implement 'in' and 'not in' for Pandas dataframe

These are the columns and types:

Title                object
Rating              float64
Votes                 int64
Revenue_Millions    float64
Action              float64
Adventure           float64
Animation           float64
Biography           float64
Comedy              float64
Crime               float64
Drama               float64
Family              float64
Fantasy             float64
History             float64
Horror              float64
Music               float64
Musical             float64
Mystery             float64
Romance             float64
Sci-Fi              float64
Sport               float64
Thriller            float64
War                 float64
Western             float64

Ayoub ZAROU · Accepted Answer · 2019-07-25T12:51:22.137

0

you could do instead

exclude = ['Title', 'Votes', 'Rating', 'Revenue_Millions']
for col in re_reshaped_df.columns:
    if col not in exclude:
        re_reshaped_df[col] = re_reshaped_df[col].astype('int')

because here your col variable is a column name, so a string but not a Series, so pandas methods won't work on it. Another way to go about this, and would be a faster one, is :

exclude = ['Title', 'Votes', 'Rating', 'Revenue_Millions']
ix = re_reshaped_df.columns.drop(exclude)
re_reshaped_df.loc[:,ix] = re_reshaped_df.loc[:,ix].astype(int)

edited Jul 25 '19 at 12:51

answered Jul 25 '19 at 12:36

Ayoub ZAROU

2,387
6
20

I edited my second method, because I used intersection instead of drop on my first try – Ayoub ZAROU Jul 25 '19 at 12:47
FYI, the second method errors out. TypeError: '(slice(None, None, None), Index(['Action', 'Adventure', 'Animation', 'Biography', 'Comedy', 'Crime', 'Drama', 'Family', 'Fantasy', 'History', 'Horror', 'Music', 'Musical', 'Mystery', 'Romance', 'Sci-Fi', 'Sport', 'Thriller', 'War', 'Western'], dtype='object', name='Genre'))' is an invalid key – MarkS Jul 25 '19 at 12:50
sorry , I just re-edited it , I forgot the `loc` thing – Ayoub ZAROU Jul 25 '19 at 12:51
1

That was it. Much obliged. – MarkS Jul 25 '19 at 12:53

Convert a subset of pandas columns to int

1 Answers1