0

I am trying to select a range of numbers from one column 'Description' and then move this pattern to a new column called 'Seating' however the new column is not returning any values and is just populated with values equalling to 'none'. I have used a for loop to iterate through the columns to locate any rows with this pattern but as i said this returns values equal to none. Maybe I have defined the pattern incorrectly.

import re
import pandas as pd

# Defined the indexes
data = pd.read_csv('Inspections.csv').set_index('ACTIVITY DATE')

# Created a new column for seating which will be populated with pattern
data['SEATING'] = None
# Defining indexes for desired columns
index_description = data.columns.get_loc('PE DESCRIPTION')
index_seating = data.columns.get_loc('SEATING')

# Creating a pattern to be extracted
seating_pattern = r' \d([1-1] {1} [999-999] {3}\/[61-61] {2} [150-150] {3})'

# For loop to iterate through rows to find and extract pattern to 'Seating' column
for row in range(0, len(data)):
    score = re.search(seating_pattern, data.iat[row, index_description])
    data.iat[row, index_seating] = score

data

Output of code showing table where the columns are populated:

Output of code showing table where the columns are populated

Following code populates seating column

I have tried .group() and it returns the following error AttributeError: 'NoneType' object has no attribute 'group'

What am I doing wrong in that it shows <re.Match object; span=(11, 17), match='(0-30)'> instead of the result from the pattern.

CoderMan
  • 37
  • 5
  • Are you sure your `re.search` actually produces matches? Also: You might want to use [extract](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.extract.html?) instead of looping over rows. – Timus Jan 02 '21 at 12:10

2 Answers2

0

Selecting columns in pandas can be much easier than this first take a copy of the dataframe to apply the changes safely and then select values as the following

data_copied = data.copy()
data_copied['SEATING'] = data_copied[(data_copied['Description'] <= start_range_value) & (data_copied['Description'] >= end_range_value)]

this link is helpful on building column by selecting based on rows of another column without changing values https://www.geeksforgeeks.org/how-to-select-rows-from-a-dataframe-based-on-column-values/ this question to dive into the same topic with more customization , it will make u solve similar more complex issues pandas create new column based on values from other columns / apply a function of multiple columns, row-wise

0

It's not completely clear to me what you want to extract with your pattern. But here's a suggestion that might help. With this small sample frame

df = pd.DataFrame({'Col1': ['RESTAURANT (0-30) SEATS MODERATE RISK',
                            'RESTAURANT (31-60) SEATS HIGH RISK']})
                                      Col1
0    RESTAURANT (0-30) SEATS MODERATE RISK
1       RESTAURANT (31-60) SEATS HIGH RISK

this

df['Col2'] = df['Col1'].str.extract(r'\((\d+-\d+)\)')

gives you

                                      Col1   Col2
0    RESTAURANT (0-30) SEATS MODERATE RISK   0-30
1       RESTAURANT (31-60) SEATS HIGH RISK  31-60
Timus
  • 10,974
  • 5
  • 14
  • 28
  • Ok it appears my pattern was wrong. Now the seating column is populated but I have the following now populating that column, how do I get it to show the digits? – CoderMan Jan 02 '21 at 18:54
  • @CoderMan What do you mean by _"show the digits"_? Can you define exactly how your expected output should look like? – Timus Jan 02 '21 at 18:58
  • Sorry i am new to this. I updated the original post to reflect the changes if that helps you understand what i am trying to get at. I am hoping for an output to extract the 61-150 results to a new column. It shows restaurants (61-150) i am trying to output all the examples of the (61-150) to the new column 'seating – CoderMan Jan 02 '21 at 19:04
  • @CoderMan `re.search(r'\((\d+-\d+)\)', 'RESTAURANT (0-30) SEATS MODERATE RISK').group(1)` gives you `0-30`. Is that what you are looking for? If so, again, I'd use `.str.extract` to create the column directly. Looping over rows is only for cases where there is no other method available. – Timus Jan 02 '21 at 19:13
  • Yes actually what you say makes much more sense, i was just wondering why it would not work in my for loop. I am curious, how i can get a return of just values 61-150 from the re.search, the following returns a Naan value (r'\((\d61-\d150)\)' To clarify, i mean i would like to return just values 61-150 and have 0-30 etc. stay behind in the original column – CoderMan Jan 02 '21 at 22:40
  • @CoderMan For fixed numbers you don't need the `\d`, just `r'(61-150)'` should do it. – Timus Jan 02 '21 at 23:18
  • Ok thanks, if i would like to also capture and extract other information can i just append it onto the end of this like so? r'(61-150)''(1-1,999 SF)' to also extract the last part? – CoderMan Jan 03 '21 at 14:02
  • @CoderMan I'm not sure I completely understand, but that most likely won't work. If you want to extract a set of different matches at once you have to (1) combine the patterns through `|` (logical or) and use `str.extractall` instead of `str.extract`. As an example take the sample frame I've used in my answer and try `print(df['Col1'].str.extractall(r'(0-30)|(31-60)'))` to see how it works. – Timus Jan 03 '21 at 14:15