0

I have a large dataset (around 2 GB excel), for which I need to create multiple columns out of one column. I am getting results but it is taking too much time to produce new columns. Also many times I am getting a memory error. Is there another efficient method to achieve my required results? Please help me if possible. the code sample is as below: import pandas as pd

data = {'product_name': ['laptop-active', 'printer-active', 'tablet-active', 'desk-passive', 'chair-passive'],
        'price': [1200, 150, 300, 450, 200]
        }

df = pd.DataFrame(data)

print (df)

def namefun(s):
    y=s.split("-")
    return y[0],y[1]
df[['A','B']]=df.apply(
    lambda row: pd.Series(namefun(row['product_name'])), axis=1)
DD08
  • 77
  • 1
  • 11

1 Answers1

0

You can use str.split and use the expand parameter to create multiple columns:

df[['A','B']] = df['product_name'].str.split('-', n=1, expand=True)

Output:

     product_name  price        A        B
0   laptop-active   1200   laptop   active
1  printer-active    150  printer   active
2   tablet-active    300   tablet   active
3    desk-passive    450     desk  passive
4   chair-passive    200    chair  passive
  • @enke- thank you for a solution. I used this but in some of the cells I have a single value, which gave me "ValueError: Columns must be the same length as key" so I switched to mentioned method – DD08 Apr 16 '22 at 18:53
  • @DD08 I see what the issue is now. There are some rows where there are more than one `-`s, so when we split on it, we get three columns, so we can't assign to two columns. I edited the answer to account for it. See if it works now. –  Apr 17 '22 at 02:21