Sorting data between 2 lists into a new list, and formatting a list of strings using the data saved in the new list

Question

apologies if this isn't very clear, this is my first time asking a question here so I hope I can explain my problem correctly.

I have the following lists with different values:

A_list = ['A', 'A', 'B', ['C', 'D'] ]
B_list = ['A1', 'W5', 'X6', 'A2', 'A3', 'T5', 'B0', 'Z9', 'C1', 'W3', 'D1']
C_list = []
string_list = ["{0} in Alpha", "{0} in Apple", "{0} in Bee", "{0} in Cheese and {1} in Dice"]

I need to find the elements of A_list in B_list, append them to C_list, and have the output be a formatted string from string_list with the elements in C_list.

So after looking for A_list[i] in B_list, C_list would end up like this:

C_list = ['A1', 'A2', 'A3', 'B0', ['C1', 'D1'] ]

And the output would be something like this:

A1 in Alpha,
A1 in Apple,
A2 in Alpha,
A2 in Apple,
A3 in Alpha,
A3 in Apple,
B0 in Bee,
C1 in Cheese and D1 in Dice

I've been wrecking my head with nested lists and getting them in a similar order as A_list to be able to format the output with something like:

output = string_list[i].format(*C_list[i]) // just an example

I've been trying to solve this problem using a mix of for-loops and if statements. I can search for the elements of A_list in B_list in a simple for-loop:

for a in A_list:
    for b in B_list:
        if a in b:
            print(str(a) + " found in " + str(b))

What's wrecking me is how to add the found elements of B_list into a similar format as A_list so that I may end up with

C_list = ['A1', 'A2', 'A3', 'B0', ['C1', 'D1']]

and not this:

C_list = ['A1', 'A2', 'A3', 'B0', 'C1', 'D1']

hi, is there only supposed to be one `'A'` in `A_list = ['A', 'A', 'B', ['C', 'D'] ]`? — mcursa-jwt, Apr 07 '23 at 15:24

score 1 · Answer 1 · answered Apr 07 '23 at 15:20

The problem is much more manageable if you normalize A_list as you process it so that it's always a list of strings:

for a in A_list:
    # Normalize a to a list[str]
    a = a if isinstance(a, list) else [a]
    # Pop all matches from B_list into C_list.
    while True:
        c = []
        for i in a:
            for b in B_list.copy():
                if b.startswith(i):
                    c.append(b)
                    B_list.remove(b)
                    break
            if len(c) == len(a):
                break  # append this c and scan B again
        else:
            break  # no more matches, continue to next a
        # Convert c back to a str|list[str]
        C_list.append(c[0] if len(c) == 1 else c)

print(C_list)
# ['A1', 'A2', 'A3', 'B0', ['C1', 'D1']]

I might suggest leaving c as a list of strings in all cases, since it might make your formatting part easier, but hopefully the above gets you over the initial hurdle of how to process the data out of this tricky nested format (while still having the option to convert it back into the original tricky format if needed).

hey great answer, however i realised that if there are multiple 'C's, e.g. 'C1', 'C2', 'C3', you would get `['A1', 'A2', 'A3', 'B0', ['C1', 'C2']]`. How can one work around this? — mcursa-jwt, Apr 07 '23 at 17:50
Depends on what output you want to get out of it. I had to infer the rules based on your example and it's not obvious how you'd apply them in that instance. (Note: I'm not going to rewrite the code based on an updated question, only the first one is free.) — Samwise, Apr 07 '23 at 17:52

score 1 · Accepted Answer · answered Apr 07 '23 at 18:30

Part 1: getting C_list

you will have to create nested lists yourself to append to C_list. if an item from a can either be a list of strings or a string, you have 2 cases.

def get_A_in_B(a_list:"list[str|list[str]]",b_list:"list[str]"):
    c_list = [] # global within this function     
    
    # for neatness   
    def process_base_item(a_str:"str",out_list:"list"):
        matches = sorted([b_str for b_str in b_list if b_str.startswith(a_str)])
        out_list.extend(matches)
    
    for a_item in a_list: # case 1 - is list, extend nested
        if type(a_item) is list:
            sublist = a_item
            nested_list = []
            for sub_item in sublist:
                process_base_item(sub_item,nested_list)
            if nested_list:
                c_list.append(nested_list)
        else: # case 2 - is string, extend c list
            process_base_item(a_item,c_list)
    return c_list

usage:

A_list = ['A', 'B', ['C', 'D'] ]
B_list = ['A1', 'W5', 'X6', 'A2', 'A3', 'T5', 'B0', 'Z9', 'C1', 'W3', 'D1']
C_list = get_A_in_B(A_list,B_list,string_list)

output:

['A1', 'A2', 'A3', 'B0', ['C1', 'D1']]

Part 2: formatting

this will work if 2 assumptions are upheld:

assuming there is only one of each type of letter in format strings
assuming if you want to cycle through all possibilities if nested is uneven e.g. ["C1", "C2", "D1"] => "C1"+"D1", "C2"+"D1"

this was the real tricky part. i used regex to match the letter to the format string.

for C_list's nested lists, i split them into more sublists by their letter, and then got their cartesian product to input as multiple arguments to the format string.

and same as before, you have 2 cases.

def format_string_list(c_list,string_list):
    formatted_string_list = []
    for c_item in c_list:
        for fmt_str in string_list:
            if type(c_item) is list: # case 1 - is list, match multiple
                c_sublist = c_item
                # assumption 1: letters are unique
                first_letters = sorted(set([c_str[0] for c_str in c_sublist]))
                matched_letters = []
                for letter in first_letters:
                    pat = f" in {letter}"
                    if pat in fmt_str:
                        matched_letters.append(letter)
                        
                if first_letters==matched_letters: 
                    # get dictionary of lists, indexed by first letter
                    c_str_d = {}
                    for letter in first_letters:
                        c_str_d[letter] = [c_str for c_str in c_sublist if letter in c_str]
                    
                    # assumption 2: get all combinations
                    for c_str_list in itertools.product(*c_str_d.values()):
                        c_fmtted = fmt_str.format(*c_str_list)
                        formatted_string_list.append(c_fmtted) 
            else: # case 2
                c_str = c_item
                first_letter = c_str[0]
                pat = f" in {first_letter}"

                if pat in fmt_str:
                    c_fmtted = fmt_str.format(c_str)
                    formatted_string_list.append(c_fmtted)
    
    return formatted_string_list

usage:

C_list = ['A1', 'A2', 'A3', 'B0', ['C1', 'D1'] ]
string_list = ["{0} in Alpha", "{0} in Apple", "{0} in Bee", "{0} in Cheese and {1} in Dice"]
formatted_string_list = format_string_list(C_list,string_list)
# print output
print("\n".join(formatted_string_list))

output:

A1 in Alpha
A1 in Apple
A2 in Alpha
A2 in Apple
A3 in Alpha
A3 in Apple
B0 in Bee
C1 in Cheese and D1 in Dice

works on more complex cases too

doesnt go beyond one level nesting, don't think you need it for your case

A_list = ['A', 'B', ['C', 'D', 'E']]
B_list = ['A1', 'W5', 'X6', 'D2', 'E1', 'A2', 'A3', 'T5', 'E2', 'B0', 'Z9', 'C1', 'W3', 'D1']
string_list = ["{0} in Alpha", "{0} in Apple", "{0} in Bee", "{0} in Cheese and {1} in Dice {2} in Egg"]

output:

['A1', 'A2', 'A3', 'B0', ['C1', 'D1', 'D2', 'E1', 'E2']]
A1 in Alpha
A1 in Apple
A2 in Alpha
A2 in Apple
A3 in Alpha
A3 in Apple
B0 in Bee
C1 in Cheese and D1 in Dice E1 in Egg
C1 in Cheese and D1 in Dice E2 in Egg
C1 in Cheese and D2 in Dice E1 in Egg
C1 in Cheese and D2 in Dice E2 in Egg

Sorting data between 2 lists into a new list, and formatting a list of strings using the data saved in the new list

2 Answers2

Part 1: getting C_list

Part 2: formatting

this will work if 2 assumptions are upheld:

works on more complex cases too