Splitting a big dataframe into multiple dataframes

Question

I have a list of power plants and I split them in for loop and make some processes like this.

list_of_pp = [v for k, v in pp_data.groupby('filename')]
        dataframes = pd.DataFrame()
        for pp in list_of_pp:
            pp = pp.groupby(['Date', 'filename']).sum().reset_index().set_index('Date').reindex(YF_date_range)
            pp['filename'] = pp['filename'].replace('', np.nan).ffill().bfill()
            pp.fillna(0, inplace = True)
            dataframes = dataframes.append(pp)


Output:

Date                filename     teklifId    fiyat miktar       SST      SAT
2019-11-10 00:00:00  bergama    205379348   620,68  -3,4    1055,15        0
2019-11-10 01:00:00  bergama    205385090   622,18  -2,9    902,161        0
2019-11-10 02:00:00  bergama    205392261   622,24  -0,8    248,896        0
2019-11-10 03:00:00  bergama    205398901   559,78  -0,6    307,879   -139,9
2019-11-10 04:00:00  bergama    205407003   559,98  -1,9    615,978   -83,99
2019-11-10 05:00:00  bergama    205414086   620,38  -2,8    1147,70   -279,1
2019-11-10 06:00:00  bergama    205420617   630,24  -2,9    913,848        0
2019-11-10 07:00:00  bergama    205426123   623,28  -2,6    1184,23   -373,9
2019-11-10 08:00:00  bergama    205432679   397,98    -4     795,96        0
2019-11-10 09:00:00  bergama    205440561      336 -10,3     1730,4        0
2019-11-10 10:00:00  bergama    205450946      400 -10,9       2180        0
2019-11-10 11:00:00  bergama    205460808      350  -3,5     1242,5     -630
2019-11-10 12:00:00  bergama    205468765   335,98  -2,5    587,965  -167,99
2019-11-10 13:00:00  bergama    205476320   335,98    -1    419,975  -251,98
2019-11-10 14:00:00  bergama    205482691   396,92  -1,2    238,152        0
2019-11-10 15:00:00  bergama    205488983      336   2,7          0     -453
2019-11-10 16:00:00  bergama    205495848    592,3     6          0    -1776
2019-11-10 17:00:00  bergama    205503077    623,9   5,6     218,36    -1965
2019-11-10 18:00:00  bergama    205511694    653,8     5     424,97    -2059
2019-11-10 19:00:00  bergama    205520491    656,9   1,3     164,22   -591,2
2019-11-10 20:00:00  bergama    205531685   650,98  -0,1     585,88   -553,3
2019-11-10 21:00:00  bergama    205545909    643,5  -1,1     804,37   -450,4
2019-11-10 22:00:00  bergama    205557633    638,2     4          0    -1276
2019-11-10 23:00:00  bergama    205567413    622,9   0,3     685,25   -778,7
2019-11-10 00:00:00  irmak      102689118    310,3    -1     310,34        0
2019-11-10 01:00:00  irmak              0        0     0          0        0
2019-11-10 02:00:00  irmak              0        0     0          0        0
2019-11-10 03:00:00  irmak      102699101   279,89  -0,6          1        0
                                      .
                                      .
                                      .
2019-11-10 23:00:00 tekirdag    302699101        0     0          0        0

Every filename represents a power plant and every power plant data has same index ('2019-11-10 00:00:00' to '2019-11-10 23:00:00') but they are combined top and bottom. There are almost 50 power plants in this dataframe and I want to split them with a name which is filename. I want to access these dataframes.

For example: When I print 'bergama' I want to see only one dataframe which contains bergama's informations.

Because of this big dataframe created in for loop, I can not assign a name to small dataframes so I can not call after the for loop. Thus, I had to combine this data. I thought it might be more possible to separate it in this situation.

What can I do in order to split this dataframe and assign a name to them?

so if IIUC, you want to split by `filename` and create a dictionary of dataframes to access? — Umar.H, Nov 21 '19 at 10:32
Possible duplicate of [Splitting dataframe into multiple dataframes](https://stackoverflow.com/questions/19790790/splitting-dataframe-into-multiple-dataframes) — magraf, Nov 21 '19 at 13:31

magraf · Answer 1 · 2019-11-21T13:32:33.933

0

From what I understood, you have all records of power plants in one data frame and now you'd like to access single ones by their name, which is in the feature filemame?

You can visit those columns of singel power plants, e.g. the 'bergama', by simply stating:

print(df[df.filename=='bergama'])

also, this question has been raised multiple times. Could someone please flag it?

edited Nov 21 '19 at 13:32

answered Nov 21 '19 at 10:47

magraf

420
5
8

I know this way but I do not want to do this for 50 power plants, thank you anyway :) – JuniorESE Nov 21 '19 at 11:10
The use df.groupby('filename') – magraf Nov 21 '19 at 11:12

score 0 · Answer 2 · answered Nov 22 '19 at 12:52

Get the unique values from the filename column and store in list or set. (dataframe.filename.unique) 2.Pass each value in the list or set to get seperatefiles.

Another observation:

(0-23-> bergama, 24-48-irkmax and so on) 1.Get the seperate file for each powerplant by using index .(By using for loop with step=23,intial=0,final=50*23)

Splitting a big dataframe into multiple dataframes

2 Answers2