Split the values in lists into rows in pandas

Question

I have a problem in splitting the values in a list in a series into columns

Background

I made a table in postgresql and read the data in pandas

# Read the table from postgresql 
sql = "select * from music_store;"
dat = pd.read_sql_query(sql, conn)
dat

And the data looks like this:

The data type is as follows:

print(dat.dtypes)

I would like to normalize the table and what I want to achieve is:

Issue

To tackle this, I tried a couple of methods First, I tried to use set_index and explode method as follows

dat.set_index(['transaction_id', 'customer_name', 'cashier_name', 'year']).\
    apply(lambda x: x.str.split(',').explode()).reset_index()
dat

It returns the same result as original

Then I tried to use the stack methods as follows:

dat.set_index(['transaction_id', 'customer_name', 'cashier_name', 'year']).stack()
dat

And I got the same result.

Finally, since it is shown as string value by default, I suspected that the issue is related to the brackets, so I tried to remove the brackets first by using the replace function as follows:

dat['albums_purchased'] = dat['albums_purchased'].str.replace('[', '').replace(']', '')
dat

Interesting, I got a nan result

Questions

What is my issue here? And How can I solve the issue?
Why would I get a NaN when I tried to replace a char in the string value?
The task is actually to normalize the tables in the database. In case there are more then 10 million roles, would this approach still works or how to I proceed in order to speed up the process?

Thank you for all of your help in advance!

what is the output of `print(type(dat.loc[0,'albums_purchased']))` — Anurag Dabas, Jul 18 '21 at 09:40
Because it is not needed as we are passing `ignore_index` parameter in `explode()` so index starts with 0,1,2..... — Anurag Dabas, Jul 18 '21 at 10:00

Split the values in lists into rows in pandas

0 Answers0