0

I am analyzing accident reports about roof falls in underground mines using Python Pandas and I want to extract the dimensions of the roof fall from a column called Narrative to a new column called roof fall dimension. Basically, If a number in Narrative is followed by a measuring unit or a symbol for a measuring unit such as (ft, feet, inch, meter, or '), I want to extract that number, I also want to extract the word that follows the units or the symbol of the unit.

Narrative

Fall was 19' wide X 20' long x 7' thick                       
Fall was approx. 19' W. x 80' L. x 10' H                      
fall was approx. 5 ft thick, 10 ft wide x 4 ft long           
fall is 35 feet long X 5 feet wide X 16 feet in height       

roof fall dimension

19' wide, 20' long, 7' thick

19' W, 80 L, 10' H

5 ft thick, 10 ft wide, 4 ft long 
 
35 feet long, 19 feet wide, 16 feet height

I am having trouble figuring out how to check if the number that I want to extract is followed by any of the following ((ft, feet, inch, meter, or ') and my regular expression experience is limited.

thanks a lot for helping out.

Karn Kumar
  • 8,518
  • 3
  • 27
  • 53
Eng_GR
  • 57
  • 6
  • 3
    You could improve this question by including the code you have tried, or clarifying what the input & expected output is. – Tom Jun 26 '21 at 16:36
  • Take a look at [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/15497888) for some more information about community expectations in these types of questions. – Henry Ecker Jun 26 '21 at 16:44

1 Answers1

1

Although you'd need to cover all scenarios, the code would go something like this:

a=re.findall("([0-9]+)['| ft| inch| meter]+", text)
print(a)

This would search for cases where there are zero or more occurrences of (ft, feet, inch, meter, or ')

Check this out for more https://www.w3schools.com/python/python_regex.asp

Harsh Sharma
  • 183
  • 1
  • 10
  • Thanks a lot Harsh for helping out. when I tried your code, I found that it extracted any numbers in the column "Narrative". but this is not the case here. I want to extract the numbers IF it was FOLLOWED by any of the following (ft, feet, inch, meter, or '). – Eng_GR Jun 26 '21 at 17:15
  • you mean something like ft 20 or meter 18 ? – Harsh Sharma Jun 26 '21 at 17:20
  • Anything you add inside round brackets ( () ) would be extracted from the pattern and you can mention any pattern inside the square brackets ( [] ) with or conditions – Harsh Sharma Jun 26 '21 at 17:22