I am sorry, I am aware the title is somewhat fuzzy.
Context
I am using a Dataframe
to keep track of files because pandas DataFrame
features several relevant functions to do all kind of filtering a dict cannot do, with loc
, pd.IndexSlice
, .index
, .columns
, pd.MultiIndex
...
Ok, so this may not appear as the best choice for expert developers (which I am not), but all these functions have been so much handy that I have come to use a DataFrame
for this.
And cherry on the cake, __repr__
of a MultiIndex Dataframe
is just perfect when I want to know what is inside my file list.
Quick introduction to Summary
class, inheriting from DataFrame
Because my DataFrame
, that I call 'Summary', has some specific functions, I would like to make it a class, inheriting from pandas DataFrame
class.
It also has 'fixed' MultiIndexes, for both rows and columns.
Finally, because my Summary
class is defined outside the Store
class which is actually managing file organization, Summary
class needs a function from Store
to be able to retrieve file organization.
Questions
Trouble with pd.DataFrame
is (AFAIK) you cannot append rows without creating a new DataFrame
.
As Summary
has a refresh
function so that it can recreate
itself by reading folder content, a refresh
somehow 'reset' the 'Summary' object.
To manage Summary
refresh, I have come up with a first code (not working) and finally a second one (working).
import pandas as pd
import numpy as np
# Dummy function
def summa(a,b):
return a+b
# Does not work
class DatF1(pd.DataFrame):
def __init__(self,meth,data=None):
cmidx = pd.MultiIndex.from_arrays([['Index', 'Index'],['First', 'Last']])
rmidx = pd.MultiIndex(levels=[[],[]], codes=[[],[]],
names=['Component','Interval'])
super().__init__(data=data, index=rmidx, columns=cmidx, dtype=np.datetime64)
self.meth=meth
def refresh(self):
values = [[pd.Timestamp('2020/02/10 8:00'),pd.Timestamp('2020/02/10 8:00')],
[pd.Timestamp('2020/02/11 8:00'),pd.Timestamp('2020/02/12 8:00')]]
rmidx = pd.MultiIndex.from_arrays([['Comp1','Comp1'],['1h','1W']],names=['Component','Interval'])
self = pd.DataFrame(values, index=rmidx, columns=self.columns)
ex1 = DatF1(summa)
In [10]: ex1.meth(3,4)
Out[10]: 7
ex1.refresh()
In [11]: ex1
Out[11]: Empty DatF1
Columns: [(Index, First), (Index, Last)]
Index: []
After refresh()
, ex1
is still empty. refresh
has not worked correctly.
# Works
class DatF2(pd.DataFrame):
def __init__(self,meth,data=None):
cmidx = pd.MultiIndex.from_arrays([['Index', 'Index'],['First', 'Last']])
rmidx = pd.MultiIndex(levels=[[],[]], codes=[[],[]],
names=['Component','Interval'])
super().__init__(data=data, index=rmidx, columns=cmidx, dtype=np.datetime64)
self.meth=meth
def refresh(self):
values = [[pd.Timestamp('2020/02/10 8:00'),pd.Timestamp('2020/02/10 8:00')],
[pd.Timestamp('2020/02/11 8:00'),pd.Timestamp('2020/02/12 8:00')]]
rmidx = pd.MultiIndex.from_arrays([['Comp1','Comp1'],['1h','1W']],names=['Component','Interval'])
super().__init__(values, index=rmidx, columns=self.columns)
ex2 = DatF2(summa)
In [10]: ex2.meth(3,4)
Out[10]: 7
ex2.refresh()
In [11]: ex2
Out[11]: Index
First Last
Component Interval
Comp1 1h 2020-02-10 08:00:00 2020-02-10 08:00:00
1W 2020-02-11 08:00:00 2020-02-12 08:00:00
This code works!
I have 2 questions:
why the 1st code is not working? (I am sorry, this is maybe obvious, but I am completely ignorant why it does not work)
is calling
super().__init__
in myrefresh
method acceptable coding practise? (or rephrased differently: is it acceptable to callsuper().__init__
in other places than in__init__
of my subclass?)
Thanks a lot for your help and advice. The world of class inheritance is for me quite new, and the fact that DataFrame
content cannot be directly modified, so to say, seems to me to make it a step more difficult to handle.
Have a good day, Bests,
Error message when adding a new row
import pandas as pd
import numpy as np
# Dummy function
def new_rows():
return [['Comp1','Comp1'],['1h','1W']]
# Does not work
class DatF1(pd.DataFrame):
def __init__(self,meth,data=None):
cmidx = pd.MultiIndex.from_arrays([['Index', 'Index'],['First', 'Last']])
rmidx = pd.MultiIndex(levels=[[],[]], codes=[[],[]],
names=['Component','Interval'])
super().__init__(data=data, index=rmidx, columns=cmidx, dtype=np.datetime64)
self.meth=meth
def refresh(self):
values = [[pd.Timestamp('2020/02/10 8:00'),pd.Timestamp('2020/02/10 8:00')],
[pd.Timestamp('2020/02/11 8:00'),pd.Timestamp('2020/02/12 8:00')]]
rmidx = self.meth()
self[rmidx] = values
ex1 = DatF1(new_rows)
ex1.refresh()
KeyError: "None of [MultiIndex([('Comp1', 'Comp1'),\n ( '1h', '1W')],\n names=['Component', 'Interval'])] are in the [index]"