0

my dataframe looks like

ID fields
1 [egg, apple, toy..]
2 [orange, bear, red..]

with lists on fields

and I wish to output a separate datadrame for each ID, so the number of dataframe would be the number of rows and the output would look like this for the first dataframe

Value
egg
apple
toy

Is there a way I can do this? Thanks

Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236
siusiusiu
  • 81
  • 3
  • Does this answer your question? [Pandas column of lists, create a row for each list element](https://stackoverflow.com/questions/27263805/pandas-column-of-lists-create-a-row-for-each-list-element) – Panagiotis Kanavos Aug 22 '22 at 06:32
  • 1
    This can be done using the `explode` method for [dataframes](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.explode.html#pandas.DataFrame.explode) or [series](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.explode.html?highlight=explode#pandas.Series.explode), eg `df.explode('fields')` or `df['fields'].explode()` – Panagiotis Kanavos Aug 22 '22 at 06:33
  • How are you going to call all these dataframes? Do you want a list of dataframes? Do you want to store them in a dictionary (maybe using the index as keys)? – Ignatius Reilly Aug 22 '22 at 06:37
  • Do you know how to create a DataFrame using a list? – Ignatius Reilly Aug 22 '22 at 06:39

1 Answers1

1

I'd approach it like this, but would probably avoid storing lists of data in cells to begin vith, better to have data tidy as it makes various pandas functionality a lot easier to work with. Not sure of the use case of many dataframes either, normally we'd have one dataframe and use e.g. groupby to access by ID.

import pandas as pd

df = pd.DataFrame({"ID": [1, 2], "fields": [["eggs", "apple", "toy"], ["orange", "bear", "red"]]})

df = df.explode("fields")
dataframes = []
for f_id in df["ID"].unique():
    df_id = df.loc[df["ID"] == f_id]
    dataframes.append(pd.DataFrame(df_id).drop(columns='ID'))
ivanp
  • 340
  • 1
  • 5
  • You're right that list cells are not ideal. However, you're basically re-inventing `explode`, just slower and less concise. As a rule of thumb, looping over a dataframe is almost always wrong. – fsimonjetz Aug 22 '22 at 07:35
  • Fair comment - the lists in cells and use case of many dataframe is a bit of an odd one. I've edited the answer above to use explode. – ivanp Aug 22 '22 at 07:48