0

I have a dataframe column signified df['Q2'] with all the responses from a users survey for the question 'which indicators do you use'? There are 36 statistical indicators users can choose from and one additional answer which is simply 'All of the indicators'. There are multiple string answers in each row as an artefact of the survey designed by someone else.

How do I simply pattern match/cross refer between respondents' answers of multiple indicators across multiple lines in the rows of column 2, with the indicators in my neat list?

indicators = ["All of the indicators",
                "A1 / Eng13",
                "A2 / Eng14",
                "Eng14b",
                "A4 / Eng17",
                "A5",
                "Eng16",
                "B1a / Eng22a",
                "B1b / Eng22b",
                "B2 / Eng23",
                "B4 / Eng18",
                "B5a / Eng19a",
                "B5b / Eng19b",
                "B6 / Eng20",
                "B7 / Eng21",
                "C1 / Eng1",
                "C2 / Eng3",
                "Eng2a",
                "C3a / Eng2b",
                "C3b / Eng4c",
                "C4a / Eng4a",
                "C4b / Eng4b",
                "C5",
                "C6",
                "C7",
                "C8",
                "Eng5",
                "Eng6",
                "Eng7",
                "Eng8",
                "C9a / Eng12a",
                "C9b / Eng12b",
                "D1a / Eng11",
                "D1b / Eng9",
                "D1c / Eng10",
                "E1 / Eng24",
                "E2 / Eng15"]

indicators_mentioned = df['Q2'] #all the responses in column 2 of dataframe
indicators_mentioned_as_string = indicators_mentioned.to_string() # convert responses to string

And regex/pattern matching attempt:

indicator_regex = re.compile(r'All of the indicators') #set a pattern to match for each indicator
instances_of_all_indicators = indicator_regex.findall(indicators_mentioned_as_string)
#assigns a variable to store all instances of indicators returned by find all function which finds all #instances within the bracketed dataset
instances_of_all_indicators_summed = (len(instances_of_all_indicators)) #list that is 6 items long contains 6 instances!
type(instances_of_all_indicators_summed) #check the value in the variable is int to go into the dictionary below as value
print(instances_of_all_indicators_summed)
#regex for second indicator
indicator_regex = re.compile(r'A1 / Eng13')
instances_of_A1 = indicator_regex.findall(indicators_mentioned_as_string)
instances_of_A1_summed = len(instances_of_A1)
print(instances_of_A1_summed)

I would like all the responses per indicator to go into a dictionary from which I can then make a nice chart.

indicator_by_reponse = {
                "All of the indicators": instances_of_all_indicators_summed,
                "A1 / Eng13": instances_of_A1_summed,
                "A2 / Eng14": instances_of_A2_summed,
                "Eng14b": instances_of_Eng14b_summed,
#                 "A4 / Eng17": instances_of_A4_summed,
#                 "A5": instances_of_A5_summed,
#                 "Eng16": instances_of_Eng16_summed,
#                 "B1a / Eng22a": instances_of_B1a_summed,
#                 "B1b / Eng22b": instances_of_B1b_summed,
#                 "B2 / Eng23": instances_of_B2_summed,
#                 "B4 / Eng18": instances_of_B4_summed,
#                 "B5a / Eng19a": instances_of_B5a_summed,
#                 "B5b / Eng19b": instances_of_B5b_summed,
#                 "B6 / Eng20": instances_of_B6_summed,
#                 "B7 / Eng21": instances_of_B7_summed,
#                 "C1 / Eng1": instances_of_C1_summed,
#                 "C2 / Eng3": instances_of_C2_summed,
#                 "Eng2a": instances_of_Eng2a_summed,
#                 "C3a / Eng2b": instances_of_C3a_summed,
#                 "C3b / Eng4c": instances_of_C3b_summed,
#                 "C4a / Eng4a": instances_of_C4a_summed,
#                 "C4b / Eng4b": instances_of_C4b_summed,
#                 "C5": instances_of_C5_summed,
#                 "C6": instances_of_C6_summed,
#                 "C7": instances_of_C7_summed,
#                 "C8": instances_of_C8_summed,
#                 "Eng5": instances_of_Eng5_summed,
#                 "Eng6": instances_of_Eng6_summed,
#                 "Eng7": instances_of_Eng7_summed,
#                 "Eng8": instances_of_Eng8_summed,
#                 "C9a / Eng12a": instances_of_C9a_summed,
#                 "C9b / Eng12b": instances_of_C9b_summed,
#                 "D1a / Eng11": instances_of_D1a_summed,
#                 "D1b / Eng9": instances_of_D1b_summed,
#                 "D1c / Eng10": instances_of_D1c_summed,
#                 "E1 / Eng24": instances_of_E1_summed,
#                 "E2 / Eng15": instances_of_E2_summed,
                       }

I converted my pandas series to a string and then used findall() but that still returns the responses in a messy way when what I want to do is group instances of the string for each indicator in the list to make a neat dictionary and a chart. To pattern match I used regex after importing re but this is a completely inelegant way of doing it which requires me to manually count the responses to put into a dictionary; there must be a simpler way.

Xiddoc
  • 3,369
  • 3
  • 11
  • 37
Eddie
  • 1
  • 2
  • Your question needs a minimal reproducible example consisting of sample input, expected output, actual output, and only the relevant code necessary to reproduce the problem. See [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for best practices related to Pandas questions. – itprorh66 Dec 28 '22 at 15:38

0 Answers0