1

I have a pandas dataframe with two street address columns. I would like to check the value in each column to see if it starts with a number. Then I want to create a third column that returns the field value that starts with a number.

Consider the following df:

df = pd.DataFrame({"A":["123 Fake St","456 Fake St","Crown Building","Other Building"], 
                   "B":["Dorm","12 Dorm","34 Dorm","Other Dorm"]})

If both fields or neither field starts with a number then it should return column A. So the third column would be:

123 Fake St
456 Fake St
34 Dorm
Other Building

I tried using np.where:

df['C'] = np.where(df['A'][0].isdigit(), df['A'], df['B'])

I guess that doesn't take into account returning 'A' if neither starts with a number. The .isdigit part of the statement didn't seem to work anyway.

Thanks for any help!

eyllanesc
  • 235,170
  • 19
  • 170
  • 241
Paul
  • 135
  • 2
  • 11

1 Answers1

3

You need to use the .str method to slice each cell value as a string rather than slicing the column as a whole.

Then to handle the case where neither column value starts with a digit, you need to add this additional condition.

Here's an example:

a_is_digit = df.A.str[0].str.isdigit()
neither_is_digit = ~df.A.str[0].str.isdigit() & ~df.B.str[0].str.isdigit()
mask = a_is_digit | neither_is_digit
df['C'] = np.where(mask, df.A, df.B)

With result:

                A           B               C
0     123 Fake St        Dorm     123 Fake St
1     456 Fake St     12 Dorm     456 Fake St
2  Crown Building     34 Dorm         34 Dorm
3  Other Building  Other Dorm  Other Building
Henry Woody
  • 14,024
  • 7
  • 39
  • 56
  • Thanks! This worked with a slight edit, the last line is: df['C'] = np.where(mask, df['A'], df['B']) – Paul Oct 18 '18 at 20:18