2

I want to replace a column in a pandas dataframe with a portion of another column. What I have so far is:

for index, row in df.iterrows():
  File = df.at[row, 'FileName']
  df.at[row, 'NUMBER'] = File.split(".")[1]

Ideally, this will iterate through rows of the dataframe and replace the number column with a portion of the FileName column

I am getting the error:

ValueError: At based indexing on an integer index can only have integer indexers

and I think it has to do with the misuse of df.at[], but I am not sure how to fix it.

Joe S
  • 410
  • 6
  • 16

3 Answers3

3

Dont loop by iterrows because slow, better is use str.split with selecting second lists by indexing:

df['NUMBER'] = df['FileName'].str.split(".").str[1]

Or use list comprehension if need better performance:

df['NUMBER'] = [x.split(".")[1] for x in df['FileName']]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    Your solution is objectively better than what I was doing. Thanks so much for your help! – Joe S Aug 23 '18 at 14:05
1

In case you are wondering about error

change df.at[row, 'NUMBER'] to df.at[index, 'NUMBER'] it should be index instead of row which is whole dataframe

it should be like this

for index, row in df.iterrows():

  df.at[index, 'NUMBER'] = row['FileName'].split(".")[1]

for more info

I prefer jezrael's answer for solution

Nihal
  • 5,262
  • 7
  • 23
  • 41
0

I believe what you are looking for is "split" in combination with "expand=True". Working example:

import pandas as pd
col_1 = ['abc', 'abc', 'bcd', 'bcd']
col_2 = ['james.25', 'jane.23', 'andrew.15', 'jim.22']
data = pd.DataFrame({'NUMBER': col_1, 'FileName': col_2})

data['NUMBER'] = data['FileName'].str.split('.', expand=True)[1]