0

I feel like this is a super simple question, I just don't have the vocabulary to articulate it in google. Here goes:

I have a dataframe that I want to slice and split into several dataframe. So I created a function and a for loop for this.

Sample table

     col1 col2 col3 col4 col5
row1 A    Hi   my   name is
row2 A    Bye  see  you  later
row3 B    Bike on   side walk
row4 B    Car  on   str  drive
row5 C    Dog  on   grs  poop

My code is like this

list_ = list(df['col1'].drop_duplicates())
for i in list_:
    dataframe_creator(i)

My function list this

def dataframe_creator(i):
        df = df[df['col1'] == i] 
        return df

So the results of this is that it just creates a dataframe for slice and then assigns it to the same variable which isn't what I want, I want a variable for each iteration. Basically I'd like to have 3 dataframe labelled dfA, dfB, dfC at the end that holds each slice.

David 54321
  • 568
  • 1
  • 9
  • 23
  • 1
    How about a dict: `{f'df{k}':v for k, v in df.groupby('col1')}` with keys `dfA`, `dfB`... etc and the values being the associated DataFrame slices – Chris Adams Mar 11 '20 at 15:40
  • 1
    How about a list comprehension to generate a list of DataFrames? `[dataframe_creator(i) for i in list_]`? – dspencer Mar 11 '20 at 15:41
  • Check out [this post](https://stackoverflow.com/questions/1373164/how-do-i-create-a-variable-number-of-variables) for why `dict` is best for this sort of thing – Chris Adams Mar 11 '20 at 15:45

2 Answers2

1

Making a dictionary would be ideal for this case!:

df_slicer = {} 
for i in df.col1: 
    df_slicer[i] = df[df.col1==i]
#dfA:
df_slicer['A']
Martijniatus
  • 102
  • 3
  • Thank you for the code, this helped out a lot. Can you extend it further? How can I iterate through the dictionary to create dfA, dfB, and dfC to have 3 separate dataframes outside of a dictionary? – David 54321 Mar 11 '20 at 16:01
0

Here is what I did to ultimately go from slices of a dataframe to seperate dataframe in variables.

Create my dataframe:

data = [['A', 'Hi', 'my', 'name', 'is'], 
        ['A', 'Bye', 'see', 'you', 'later'],
       ['B', 'Bike', 'on', 'side', 'walk'],
       ['B', 'Car', 'on', 'str', 'drive'],
       ['C', 'Dog', 'grs', 'on', 'poop']] 

Set it as a dataframe

test_df = pd.DataFrame(data)

Create my list of unique column1 names

list_ = list(test_df[0].drop_duplicates())

Create the dictionary of slices

df_slicer = {}
for i in list_:
    df_slicer[i] = test_df[test_df[0] == i]

Create my variables based on the key value in the dictionary

for key, val in df_slicer.items():
    exec('df' + key + '=val')

So at the end of it dfA, dfB, dfC are each dataframe for their respective slices.

David 54321
  • 568
  • 1
  • 9
  • 23