1

I have a dataset of several items, and how many quantities are present.
(x used to indicate quantity, x never appears in the item name)

                   items
    0                abc
    1  efg x 2, abc, def
    2            abc x 2
    3   efg x 3, def x 7
    4            abc x 5

I want to extract the items as individual components, and relate their counts:

         abc    def    efg
    0      1      0      0
    1      1      1      2
    2      2      0      0
    3      0      7      3
    4      5      0      0

What I've tried so far; from Pandas split Column into multiple columns by comma

pd.concat([df, df[1].str.split(',', expand=True)], axis=1)

This gives

                   items         0         1         2
    0                abc        abc      None      None
    1  efg x 2, abc, def    efg x 2       abc       def
    2            abc x 2    abc x 2      None      None
    3   efg x 3, def x 7    efg x 3   def x 7      None
    4            abc x 5    abc x 5      None      None  

I have no idea how to proceed :/ and have been stuck at this for days. Any suggestion is greatly appreciated, even if the end product is not what I asked for.

1 Answers1

0

How about the following:

def extract_components(s):
    components = {}
    for sub in s.split(','):
        if 'x' in sub:
            prod, count = sub.split('x')
            components[prod.strip()] = int(count)
        else:
            components[sub.strip()] = 1

    return components

print(df['items'].apply(extract_components).apply(pd.Series).fillna(0))

  abc efg def
0   1   0   0
1   1   2   1
2   2   0   0
3   0   3   7
4   5   0   0
Jan Trienes
  • 2,501
  • 1
  • 16
  • 28