0

I have a dataframe like below one. I want to split the Dataframe based on Rows

     Rows            col1    value1      value2
0    row_1            var1     12         3434           
1    row_1            var2     212       546        
2    row_1            var3     340       8686       
3    row_2            var1     226        55      
4    row_2            var2     323        878        
97   row_33           var1     592        565        
98   row_33           var2     282       343    
99   row_33           var3     455        764      
100  row_34           var1     457        24        
101  row_34           var2     617        422          

expected Dataframes

Df1

     Rows            col1    value1      value2
0    row_1            var1     12         3434           
1    row_1            var2     212       546        
2    row_1            var3     340       8686                

Df2

     Rows            col1    value1      value2
0    row_2            var1     226        55     
1    row_2            var2     323        878     
2    row_2            var3     453        78               

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Rrptm
  • 329
  • 2
  • 12

1 Answers1

1
  • You can use .groupby on the 'Rows' and create a dict of DataFrames with unique 'Row' values as keys, with a dict-comprehension.
    • .groupby returns a groupby object, that contains information about the groups, where g is the unique value in 'Rows' for each group, and d is the DataFrame for that group.
  • The value of each key in df_dict, will be a DataFrame, which can be accessed in the standard way, df_dict['key'].
import pandas as pd

# setup data and dataframe
data = {'Rows': ['row_1', 'row_1', 'row_1', 'row_2', 'row_2', 'row_33', 'row_33', 'row_33', 'row_34', 'row_34'],
        'col1': ['var1', 'var2', 'var3', 'var1', 'var2', 'var1', 'var2', 'var3', 'var1', 'var2'],
        'value1': [12, 212, 340, 226, 323, 592, 282, 455, 457, 617],
        'value2': [3434, 546, 8686, 55, 878, 565, 343, 764, 24, 422]}

df = pd.DataFrame(data)

# split the dataframe and loop of the groupby object
df_dict = dict()

for g, d in df.groupby('Rows'):
    df_dict[g] = d


# or as a dict comprehension: the unique Row value will be the key
df_dict = {g: d for g, d in df.groupby('Rows')}


# or a specific name for the key, using enumerate
df_dict = {f'df{i}': d for i, (g, d) in enumerate(df.groupby('Rows'))}

df_dict['df0'] or df_dict['row_1']

    Rows  col1  value1  value2
0  row_1  var1      12    3434
1  row_1  var2     212     546
2  row_1  var3     340    8686

df_dict['df1'] or df_dict['row_2']

    Rows  col1  value1  value2
3  row_2  var1     226      55
4  row_2  var2     323     878

df_dict['df2'] or df_dict['row_33']

     Rows  col1  value1  value2
5  row_33  var1     592     565
6  row_33  var2     282     343
7  row_33  var3     455     764
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158