3

I am fairly new to Python and currently I am trying to build a function that searches for the first 2 digits of the elements in a column and if true, return the result with a new header such as region

For example,

   Adres  AreaCode Region
0  SArea    123191      A
1  BArea    122929      A
2  AArea    132222      B

I want the function to search for just the first 2 digits of the AreaCode which would give me the result of along with a new header of Region which classifies the Region based on the first 2 digits of the AreaCode. So in this case 12 would give me A and 13 would give me B

I already tried this

df.loc[df.AreaCode == 123191, 'Region'] = 'A'

and this worked for the entire AreaCode but I have no idea how to modify it so that I would be able to search based on the first 2 digits.

and I tried this

df.loc[df.AreaCode.str.contains == 12, 'Region' ] = 'A' 

but it gives me the error:

AttributeError: Can only use .str accessor with string values,
                which use np.object_ dtype in pandas

How do I fix this and thanks a lot for helping!

w-m
  • 10,772
  • 1
  • 42
  • 49
John
  • 51
  • 1
  • 4

5 Answers5

2

I tried this df.loc[df.AreaCode.str.contains == 12, 'Region' ] = 'A' but it gives me the error: AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas

You could simply convert it to a string, then use the same code:

df.loc[df.AreaCode.astype(str).str.startswith('12'), 'Region' ] = 'A'
w-m
  • 10,772
  • 1
  • 42
  • 49
  • Thank you so much this really helped! – John Sep 26 '18 at 13:13
  • @w-m Can you elaborate why astype is required here as without that explicit conversion I got an error – mad_ Sep 26 '18 at 13:14
  • @mad_: Look at the `AttributeError` already noted in the question. The `.str` accessor can only be used on columns that already hold strings. `AreaCode` was represented with ints in this case. Thus they have to be converted to strings first. – w-m Sep 26 '18 at 13:16
  • @w-m is it possible to add another condition such as df.loc[df.AreaCode.astype(str).str.startswith('12' or '13'), 'Region' ] = 'A' Thanks! – John Sep 26 '18 at 13:48
  • 1
    @Gary sure. Use `|` as or: https://stackoverflow.com/a/22591267/463796. You'll have to repeat the whole query though, can't put the or just between the values: `df.loc[...startswith('12') | ...startswith('13'), ...]` – w-m Sep 26 '18 at 13:53
2

Try this

df.loc[df.AreaCode.astype(str).str.startswith("12") == True, 'Region' ]

The line below will give you a series with True/False for each row and what becomes the filter for the dataframe.

df.AreaCode.astype(str).str.startswith("12")

Assigning a equals test makes it a filter.

mirmo
  • 376
  • 3
  • 9
2

See if this helps -

First convert Area code column dtype to string with

df.AreaCode = df.AreaCode.astype(str)

And then do filtering with first characters of the column

df.loc[(df.AreaCode.str.startswith('12')) & (df.Region=='A')]

Rhae
  • 94
  • 4
1

This will work I guess.

df.loc[df.AreaCode.str.startswith('12'), 'Region' ] = 'A'

  • This still gives me the error which i think because it isn't a string.w-m's answer solved it for me. Thank you for your help! – John Sep 26 '18 at 13:12
0

First convert the data type to str like this

df.AreaCode = df.AreaCode.astype('str')

Then check for the number in beginning like this

df.loc[df.AreaCode.startswith('12'), 'Region' ] = 'A' 

Assuming you need nan in the rows which dont start with A, you can do a map like this

df['Region'] = df['AreaCode'].map(lambda x : 'A' if x.startswith('12') else np.nan )
Sreeram TP
  • 11,346
  • 7
  • 54
  • 108