Converting string to int in Pandas column

Question

I have a .csv with US Congress biographical data that I read as a Panda df:

df = pd.read_csv('congress100.csv', delimiter = ';', names = ['Name', 'Position', 'Party', 'State', 'Congress'], header = 0)

My dataframe looks like this:

0                   'ACKERMAN, Gary Leonard'        'Representative'    'Democrat'  'NY'  '100(1987-1988)'
1                  'ADAMS, Brockman (Brock)'               'Senator'    'Democrat'  'WA'  '100(1987-1988)'
2                   'AKAKA, Daniel Kahikina'        'Representative'    'Democrat'  'HI'  '100(1987-1988)'
3    'ALEXANDER, William Vollie (Bill), Jr.'        'Representative'    'Democrat'  'AR'  '100(1987-1988)'
4                  'ANDERSON, Glenn Malcolm'        'Representative'    'Democrat'  'CA'  '100(1987-1988)'
5                   'ANDREWS, Michael Allen'        'Representative'    'Democrat'  'TX'  '100(1987-1988)'
6                          'ANNUNZIO, Frank'        'Representative'    'Democrat'  'IL'  '100(1987-1988)'
7             'ANTHONY, Beryl Franklin, Jr.'        'Representative'    'Democrat'  'AR'  '100(1987-1988)'
8                  'APPLEGATE, Douglas Earl'        'Representative'    'Democrat'  'OH'  '100(1987-1988)'
9            'ARCHER, William Reynolds, Jr.'        'Representative'  'Republican'  'TX'  '100(1987-1988)'
10                    'ARMEY, Richard Keith'        'Representative'  'Republican'  'TX'  '100(1987-1988)'

I want to convert the data in the 'Congress' column to an integer. Right now, I am first converting it to a simpler string:

df['Congress'] = df['Congress'].str.replace(r'100\(1987-1988\)', '1987')

This is successful. But, I am then trying to convert that simpler string to an integer:

df['Congress'] = df['Congress'].pd.to_numeric(errors='ignore')

I am getting an error:

AttributeError: 'Series' object has no attribute 'pd'

Please help me resolve this error and simplify my code.

Dani Mesejo · Accepted Answer · 2018-11-04T16:12:39.513

6

You need to call pd.numeric like this:

import pandas as pd

df = pd.DataFrame(data=[str(i + 1980) for i in range(10)], columns=['Congress'])
df['Congress'] = pd.to_numeric(df['Congress'], errors='ignore')
print(df)

The code above is meant as a toy example, you just need to change your line:

df['Congress'] = df['Congress'].pd.to_numeric(errors='ignore')

to:

df['Congress'] = pd.to_numeric(df['Congress'], errors='ignore')

edited Nov 04 '18 at 16:12

answered Nov 04 '18 at 15:43

Dani Mesejo

61,499
6
49
76

This is actually replacing my entire dataframe, when I only want to change the values in the 'Congress' column. – Charlie Goldberg Nov 04 '18 at 16:00
@CharlieGoldberg are you sure? I just run it, adding a dummy column and the dummy column did not change. – Dani Mesejo Nov 04 '18 at 16:07
When I print my new df after performing the code you suggested, I get this: Congress 0 1980 1 1981 2 1982 3 1983 4 1984 5 1985 6 1986 7 1987 8 1988 9 1989 – Charlie Goldberg Nov 04 '18 at 16:10
1

The code is meant as example, you need to skip the part of the creation of the dataframe. Updated the answer! – Dani Mesejo Nov 04 '18 at 16:11
I see! I'm a novice at this. Thanks for your clarification. – Charlie Goldberg Nov 04 '18 at 16:16

jimmy · Answer 2 · 2018-11-04T16:17:31.013

-1

One more way to achieve it. It would work if there are only digits in the column:-

 df['Congress'] = df['Congress'].astype(int)

edited Nov 04 '18 at 16:17

answered Nov 04 '18 at 16:07

jimmy

496
5
13

Converting string to int in Pandas column

2 Answers2