0

I am having an issue expanding the values of certain cells into multiple rows. The data I'm using is from a CSV and is being imported using the following code to make a DataFrame.

import pandas as pd
import numpy as np

df = pd.read_csv("path/to/file.csv")

A small sample of the CSV data is below.

Test User,2020/09/14,Apple
Test User,2020/09/16,Apple
Test User,2020/09/23,Apple
Test User,2020/09/30,['Apple' 'Banana']
Test User,2020/10/02,Banana
Test User,2020/10/05,Apple
Test User,2020/10/07,Banana
Test User,2020/10/09,Banana

I want to take any values like the one in the 4th row and separate them. I've tried some different ways to do this, but nothing has worked so far.

Here is the df of the above data for reference.

array([['Test User', '2020/09/14', 'Apple'],
       ['Test User', '2020/09/16', 'Apple'],
       ['Test User', '2020/09/23', 'Apple'],
       ['Test User', '2020/09/30', "['Apple' 'Banana']"],
       ['Test User', '2020/10/02', 'Banana'],
       ['Test User', '2020/10/05', 'Apple'],
       ['Test User', '2020/10/07', 'Banana'],
       ['Test User', '2020/10/09', 'Banana']], dtype=object)

Some of the ways I have tried that didn't work

1. df = df.explode('Column name')
2. df = df.apply(pd.Series)

And different methods talked about here

To go into detail, when I use the explode command nothing happens. The cells with multiple values stay the same.

I would like to take all of the rows that have cells with multiple values in them and put them each in their own row. Here is an example of what I am trying to accomplish.

Test User,2020/09/14,Apple
Test User,2020/09/16,Apple
Test User,2020/09/23,Apple
Test User,2020/09/30,Apple
Test User,2020/09/30,Banana
Test User,2020/10/02,Banana
Test User,2020/10/05,Apple
Test User,2020/10/07,Banana
Test User,2020/10/09,Banana

Does anyone have an idea as to how I could separate those values into different rows?

cpuser
  • 37
  • 7
  • 5
    Can you please show what errors did you face with `df.explode`? Also, post your expected output based on sample data. – Mayank Porwal Jan 12 '21 at 07:47
  • It's not an error. The issue is that nothing changes. The output of df.explode is literally the same as the input. I have updated my question to clarify some details. – cpuser Jan 13 '21 at 00:47

1 Answers1

1

I guess the problem is that you're trying to explode a list element within a string, that is part of pandas series. You must make sure your list is on it's own within a dataframe column or series:

>>> pd.Series(raw.splitlines()).explode()
0                  Test User,2020/09/14,Apple
1                  Test User,2020/09/16,Apple
2                  Test User,2020/09/23,Apple
3     Test User,2020/09/30,['Apple' 'Banana']
4                 Test User,2020/10/02,Banana
5                  Test User,2020/10/05,Apple
6                 Test User,2020/10/07,Banana
7                 Test User,2020/10/09,Banana

If you put that in a dataframe (or just the fruits) you should get this:

>>> df.explode('Fruit')
           User        Date   Fruit
0  1. Test User  2020/09/14   Apple
1  2. Test User  2020/09/16   Apple
2  3. Test User  2020/09/23   Apple
3  4. Test User  2020/09/30   Apple <--
3  4. Test User  2020/09/30  Banana <--
4  5. Test User  2020/10/02  Banana
5  6. Test User  2020/10/05   Apple
6  7. Test User  2020/10/07  Banana
7  8. Test User  2020/10/09  Banana

>>> fruits=pd.Series(df.Fruit)

0              Apple
1              Apple
2              Apple
3    [Apple, Banana]
4             Banana
5              Apple
6             Banana
7             Banana
Name: Fruit, dtype: object

>>> fruits.explode()

0     Apple
1     Apple
2     Apple
3     Apple
3    Banana
4    Banana
5     Apple
6    Banana
7    Banana

Later-on, if you still need to get everything back into the same "shape" you can use different methods to convert that dataframe back to series (if that's what you really want/need)

Danail Petrov
  • 1,875
  • 10
  • 12
  • Sorry for not being more clear. I am importing the data from a CSV so the variable is a DataFrame. I'm not sure how you were able to explode correctly because I never get that type of output from an explode. For me, the output of an explode is the same as an input. – cpuser Jan 13 '21 at 00:40
  • When I do fruits.explode() after a fruits=pd.Series(df.Fruit), I get the following output, which is the same as the fruits variable. ['Apple', 'Apple', 'Apple', "['Apple' 'Banana']", 'Banana', 'Apple', 'Banana', 'Banana'] – cpuser Jan 13 '21 at 00:45
  • Check [this](https://stackoverflow.com/questions/23111990/pandas-dataframe-stored-list-as-string-how-to-convert-back-to-list) post. – Danail Petrov Jan 13 '21 at 08:31