2

I have a dataframe that contains a string column with several different 4 character that can be separated by | or &, but not always. I am trying to map a dictionary to each discrete 4 character item but am running into issues. pandas ver 23.4

The basic code I am trying to use:

df = df.replace(dict, regex=True)

or if trying to select a specific col:

df['Col'] = df['Col'].replace(dict, regex=True)

Both raise the following error:

ValueError: The truth value of an array with more that one element is ambiguous. Use a.any() or a.all()

The values of the dictionary are type list. Is this something that would be an issue with performing the .replace?

Update With Sample df and dict

 ID       Code
ABCD      00FQ
JKFA    8LK9|4F5H
QWST    2RLA|R1T5&8LK9


dict={'00FQ':['A','B'], '8LK9':['X'], '4F5H':['U','Z'], '2RLA':['H','K'], 'R1T5':['B','G'] }

The dict will have more elements in it than in the dataframe.

Update with expected output

 ID       Code           Logic
ABCD      00FQ          ['A','B']
JKFA    8LK9|4F5H       ['X'] | ['U','Z']
QWST    2RLA|R1T5&8LK9  ['H','K'] | ['B','G'] & ['X']

The overall goal is to perform this replace on two dataframes, and then compare the ID's on both sides for equivalence.

MaxB
  • 428
  • 1
  • 8
  • 24

2 Answers2

0

The regex defined in your dict might be matching with more than one rows of the dataframe, and python is confused about which replacement value to take from the dict.

And, when a numpy array is checked for its boolean value, this Error is forced to save users from guessing. Would you consider an array of elements to be True if

  • Any of its element is True or
  • All of its elements are True or
  • Something else.

Thus it throws this error to allow the programmer to explicitly mention it.

Go Here for more clarification.

amitgcse
  • 31
  • 5
0

Here's a function which will allow you to parse relevant values from your strings:

def string_to_list(string):
    """
    parses a parent string for 4 character children strings
    returns a list of children strings
    """
    # instantiate values
    child = ''
    children = []

    if len(string)<4:
        return None

    for n in string:
        # skip if not wanted
        if n in ['|','&']:
            continue

        child+=n
        if len(child)==4:
            children.append(child)
            child = ''

    # finished
    return children

Apply it to extract a list of values as follows:

df['Code_List'] = df['Code'].apply(string_to_list)

Map to relevant logic values:

# Instantiate the dictionary of logic rules
logic_dict = {'00FQ':['A','B'], '8LK9':['X'], '4F5H':['U','Z'], '2RLA':['H','K'], 'R1T5':['B','G'] }

# Map the logic rules
df['Logic_List'] = df['Code_List'].apply(lambda arr: [logic_dict[x] for x in arr])

# Final output
    ID      Code            Code_List           Logic_List
0   ABCD    00FQ            [00FQ]              [[A, B]]
1   JKFA    8LK9|4F5H       [8LK9, 4F5H]        [[X], [U, Z]]
2   QWST    2RLA|R1T5&8LK9  [2RLA, R1T5, 8LK9]  [[H, K], [B, G], [X]]
Yaakov Bressler
  • 9,056
  • 2
  • 45
  • 69
  • I must update the question. There will be more elements in the dictionary than in the `Code` col of the dataframe. The logic of the expression must remain intact. – MaxB Nov 20 '19 at 16:01
  • This solution can be applied to any column containing 4 string substrings. – Yaakov Bressler Nov 20 '19 at 16:16
  • Did you consider modifying and applying to your situation @MaxB ? – Yaakov Bressler Nov 21 '19 at 02:19
  • I believe this solution will not work because the boolean logic associated with the `Code` expression must remain intact – MaxB Nov 21 '19 at 13:05
  • Perhaps modify your Q to include *your desired output* @MaxB – Yaakov Bressler Nov 21 '19 at 20:05
  • @MaxB there is enough information in this response for it to satisfy your Q, though I acknowledge that it does not solve your desired output to perfection. Do keep in mind that SO is not a code review or homework guide site. Be mindful of the work you're asking of people – and the work they've provided. – Yaakov Bressler Nov 21 '19 at 22:49
  • Your answer does not maintain the logic, I do not believe that it satisfies the question. – MaxB Nov 21 '19 at 23:54