How to separate word in python dataframe list

Question

I want to separate the conditions that exist in the ID column. What should I do in this case?

I wrote it as below, but there is a problem that it is not separated in the desired direction.

df['and'] = df['ID'].str.extract(r'\((.*?)\)', expand = False)
df['test'] = np.where(df['ID'].str.contains('or'), df['ID'].str.split('or'), None)

                  ID Note    and                 test
  A or B or (C & D)  NaN  C & D  [A ,  B ,  (C & D)]
   E or F & (G & H)  NaN  G & H   [E ,  F & (G & H)]
   W or X or Y or Z  NaN    NaN   [W ,  X ,  Y ,  Z]

What should I do for to get this result?


| ID                | Note     | And      | Or         |
| --------          | -------- | -------- | --------   |
| A or B or (C & D) |          |  C,  D   | A, B       |
| E or F & (G & H)  |          | F, G, H  | E          |
| W or X or Y or Z  |          |          | W, X, Y, Z |

score 0 · Answer 1 · answered Sep 01 '23 at 06:51

0

Maybe you can use Pandas and numpy to do that :

# Sample DataFrame
data = {'ID': ['A or B or (C & D)', 'E or F & (G & H)', 'W or X or Y or Z'],
        'Note': [np.nan, np.nan, np.nan]}
df = pd.DataFrame(data)

# Extract "And" column
df['And'] = df['ID'].str.extract(r'\((.*?)\)', expand=False)

# Convert "And" to a NumPy array
df['And'] = df['And'].apply(lambda x: np.array(x.split(' & ')) if pd.notna(x) else np.nan)

# Extract "Or" column
df['Or'] = df['ID'].str.replace(r'\s*\(.*?\)\s*', ',').str.split(' or ')

# Drop the original "ID" column
df.drop(columns=['ID'], inplace=True)

# Reorder the columns as you specified
df = df[['And', 'Note', 'Or']]

answered Sep 01 '23 at 06:51

PortorogasDS

51
6

Thanks for answering. However, # Extract "And" column - in this part have an issue. Because the logic didn't contain the 'F'. 'F &' is go on df['Or']. – Olive Sep 01 '23 at 07:24
Sorry, I hadn’t noticed this small issue. Can I share a corrected version? – PortorogasDS Sep 02 '23 at 09:17
Hi @PortorogasDS, don't mind. I'm really appreciate your help to solve this problem. I used the version that andexte's – Olive Sep 02 '23 at 09:48

score 0 · Accepted Answer · answered Sep 01 '23 at 10:46

Based on @PortorogasDS idea, I made a code for you problem. It is not so optimized, but works for me. Is that what you wanted?

# Importing the libraries
import numpy as np
import pandas as pd
from collections import Counter

# Sample DataFrame
data = {'ID': ['A or B or (C & D)', 'E or F & (G & H)', 'W or X or Y or Z'],
        'Note': [np.nan, np.nan, np.nan]}
df = pd.DataFrame(data)

# Defining a function - The idea for the function below was coming from https://stackoverflow.com/questions/72554378/pandas-iterate-list-within-a-dataframe-cell
def value_reduction(row, col_i):
    return_list = []
    col_name = df.columns[col_i]
    for i in row[col_name]:
        if len(i) > 1:
            return_list = i
    return return_list

# Adding the "And" column
df['And'] = df['ID'].str.split(' or ')
# The idea for the code below was coming from https://stackoverflow.com/questions/17006641/single-line-nested-for-loops
df['And'] = [[[i[i.rfind('(')+1:] if i.rfind('(') >= 0 else i[:i.find(')')] if i.find(')') >= 0 else i for i in splitorand.split(' & ')] for splitorand in splitor] for splitor in df['And']]
arr = df.apply(value_reduction, args=(len(df.columns) - 1,), axis=1)
df = df.assign(And=arr)

# Adding the "Or" column
df['Or'] = df['ID'].str.replace(' or ', ',').str.replace(' & ', ',').str.replace(')', '').str.replace('(', '').str.split(',')
# The idea for the code below was coming from https://stackoverflow.com/questions/56957654/how-to-compare-two-lists-of-strings-in-a-dataframe-for-any-matches-to-get-a-true
# and https://stackoverflow.com/questions/41125909/find-elements-in-one-list-that-are-not-in-the-other
df['Or'] = df.apply(lambda row: [key for key, counts in Counter(row['Or']).items() if Counter(row['And'])[key] != counts], axis=1)

# Reordering the columns as you specified
df = df[['ID', 'Note', 'And', 'Or']]

# Displaying the dataframe
df

Output:

Hi andexte. Yes, that's what I want to. Thanks for solve this problem and very detailed annotate. — Olive, Sep 02 '23 at 04:16

How to separate word in python dataframe list

2 Answers2