0

Let's say I have a df of python strings:

  string
0 this house has 3 beds inside 
1 this is a house with 2 beds in it
2 the house has 4 beds

I want to extract how many beds each house has. I felt a good way to do this would be to just find the item before beds.

While attempting to complete this problem, I of course noticed strings are indexed by character. That means I would have to turn the strings into a list with str.split(' ').

Then, I can find the index of 'beds' in each of the strings, and return the previous index. I tried both a list comprehension and df.iterrows() for this and can't seem to figure out the right way to do it. My desired output is:

  string                            beds
0 this house has 3 beds inside        3
1 this is a house with 2 beds in it   2
2 the house has 4 beds                4
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
bismo
  • 1,257
  • 1
  • 16
  • 36

1 Answers1

0

look at efficient way to get words before and after substring in text (python)

In your case, you could do

for index, row in df.iterrrows(): 
    row['beds'] = row['string'].partition('bed')[0].strip()[-1]

The partition function splits the string based on a word and returns a tuple The strip function is just used to remove white spaces. If everything works, then the number you are looking for will be at the end of the first value of the tuple. Hence the [0]

for index, row in df.iterrrows(): 
    row['beds'] = row['string'].partition('bed')[0].strip()[-1]

If the above code is broken down for better readability:

for index, row in df.iterrrows(): 
    split_str = row['string'].partition('bed')
    word_before_bed = split_str[0].strip()
    number_of_beds = word_before_bed[-1]
    row['beds'] = number_of_beds #append column to existing row

print(df.head())

The output df will have a 3 columns.

Note: this is a quick "hack". Notice there is no error checking in the loop. You should add error checking as you never know if the word "bed" shows up at all in the row.

sagar1025
  • 616
  • 9
  • 22