2

I have a dataframe that looks like below. I want to split this dataframe into two dataframe because these are different reports. I want split on 'break' index.

                                            Fri      Sat      Sun      Mon      Tue
Metric Group    Metric Type
Productive Time % Available                 82.7%     88.9%   85.0%   82.8%    65.2%   
Labor Move      % Hours Lost                72.6%     70.9%   84.0%   49.0%    75.4%  
Break           % Failed Break              85.0%     50.5%   25.0%   72.9%    65.0%  
Productive Time % Available                 52.8%     90.9%   65.0%   56.9%    45.2%        
Labor Move      % Hours Lost                62.5%     80.9%   55.0%   65.9%    95.7%  
Break           % Failed Break              83.7%     85.9%   95.0%   71.9%    45.5%  

The output should look something like.Is it possible to achieve this type of output.Any help is appreciated

df1 =                                              Fri      Sat      Sun      Mon      Tue
       Metric Group    Metric Type
       Productive Time % Available                 82.7%     88.9%   85.0%   82.8%    65.2%   
       Labor Move      % Hours Lost                72.6%     70.9%   84.0%   49.0%    75.4%  
       Break           % Failed Break              85.0%     50.5%   25.0%   72.9%    65.0%

df2= 

     Productive Time % Available                 52.8%     90.9%   65.0%   56.9%    45.2%        
     Labor Move      % Hours Lost                62.5%     80.9%   55.0%   65.9%    95.7%  
     Break           % Failed Break              83.7%     85.9%   95.0%   71.9%    45.5%   
Learner
  • 335
  • 3
  • 16

4 Answers4

3

you can create a conditional column by using cumcount against your index.

this counts each unique variable which we will use to create our key.

dfs = {g : data for g,data in df.groupby(df.groupby(level=0).cumcount())}

print(dfs)

{0:                                            Fri    Sat    Sun    Mon    Tue
 Metric Group      Metric Type                                             
 Productive Time %  Available             82.7%  88.9%  85.0%  82.8%  65.2%
 Labor Move      %  Hours Lost            72.6%  70.9%  84.0%  49.0%  75.4%
 Break           %  Failed Break          85.0%  50.5%  25.0%  72.9%  65.0%,
 1:                                            Fri    Sat    Sun    Mon    Tue
 Metric Group      Metric Type                                             
 Productive Time %  Available             52.8%  90.9%  65.0%  56.9%  45.2%
 Labor Move      %  Hours Lost            62.5%  80.9%  55.0%  65.9%  95.7%
 Break           %  Failed Break          83.7%  85.9%  95.0%  71.9%  45.5%}

print(dfs[0])

                                           Fri    Sat    Sun    Mon    Tue
Metric Group      Metric Type                                             
Productive Time %  Available             82.7%  88.9%  85.0%  82.8%  65.2%
Labor Move      %  Hours Lost            72.6%  70.9%  84.0%  49.0%  75.4%
Break           %  Failed Break          85.0%  50.5%  25.0%  72.9%  65.0%
Umar.H
  • 22,559
  • 7
  • 39
  • 74
  • @Learner step through the code one part at a time, `df.groupby(level=0)` is your index (the first one) [read about cumcount() here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.cumcount.html) and the last part is a simple dictionary comprehension. – Umar.H Jul 03 '20 at 14:57
0

Have you tried:

df1 = df.iloc(:2,:)
df2 = df.iloc(2:,:)

?

duff18
  • 672
  • 1
  • 6
  • 19
0

Do a df.index.get_loc('splitword') to get the index location of the split word.

splitindex=df.index.get_loc('Break')

Then perform the slicing:

df1=df.iloc[:(splitindex+1)]

df2=df.iloc[(splitindex+1):]
0

Try this:

ix = df.index
break_locations = [i for i, x in enumerate(ix) if x=='Break']
break_locations = [0]+break_locations+[len(df)]
df_list = []
for i in range(len(break_locations)-1):
    bl1 = break_locations[i] if  break_locations[i]==0 else  break_locations[i] +1
    bl2 = break_locations[i+1]+1
    split = df[bl1:bl2]
    if not split.empty:
        df_list.append(df[bl1:bl2])

What this snippet does is,

  • takes the index,
  • finds out where 'Break' is in the index,
  • adds the start(0) and end of the dataframe (len(df)) to that list of break points,
  • then, splits the data frame apart with two consecutive breakpoints,
  • stores each split in an array df_list

so for a dataframe df that looks like:

print(df)
                 Fri  Sat  Sun
Metric Group     NaN  NaN  NaN
Productive Time  NaN  NaN  NaN
Labor Move       NaN  NaN  NaN
Break            NaN  NaN  NaN
Productive Time  NaN  NaN  NaN
Labor Move       NaN  NaN  NaN
Break            NaN  NaN  NaN
Metric Group     NaN  NaN  NaN
Productive Time  NaN  NaN  NaN
Labor Move       NaN  NaN  NaN
Break            NaN  NaN  NaN
Productive Time  NaN  NaN  NaN
Labor Move       NaN  NaN  NaN
Break            NaN  NaN  NaN

you get a list of dataframes that look like:

for dd in df_list:
    print(dd, end="\n\n")
                 Fri  Sat  Sun
Metric Group     NaN  NaN  NaN
Productive Time  NaN  NaN  NaN
Labor Move       NaN  NaN  NaN
Break            NaN  NaN  NaN

                 Fri  Sat  Sun
Productive Time  NaN  NaN  NaN
Labor Move       NaN  NaN  NaN
Break            NaN  NaN  NaN

                 Fri  Sat  Sun
Metric Group     NaN  NaN  NaN
Productive Time  NaN  NaN  NaN
Labor Move       NaN  NaN  NaN
Break            NaN  NaN  NaN

                 Fri  Sat  Sun
Productive Time  NaN  NaN  NaN
Labor Move       NaN  NaN  NaN
Break            NaN  NaN  NaN

lytseeker
  • 305
  • 4
  • 10