1

I would like to add rows to a global dataframe from within a different class instances. Can I somehow give the global dataframe as argument when creating the class instance. In the code below the dataframe is a local copy while I would like to change the global df.

I could give the global dataframe as argument in the add_row function directly and it would work but I would like to avoid that.

I'm not sure if this is the right way to do it anyway. My goal is that I can change the same dataframe from within different classes.

import pandas as pd

history_1 = pd.DataFrame()

class ClassA:
    def __init__(self,  history):
        self.history = history

    def add_row(self, row):
        self.history = pd.concat([self.history, pd.DataFrame([row])])


class ClassB:
    def __init__(self,  history):
        self.history = history

    def add_row(self, row):
        self.history = pd.concat([self.history, pd.DataFrame([row])])


class_a = ClassA(history_1)
new_row = {'r1':1, 'r2':2, 'r3':3}
class_a.add_row(new_row)

class_b = ClassB(history_1)
new_row = {'r1':1, 'r2':2, 'r3':3}
class_b.add_row(new_row)
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
noskule
  • 43
  • 3
  • 1
    Well, `pd.concat` produces a new object. You'd rather want something like `self.history.append(...)` or so to modify the existing dataframe (not sure what the appropriate DF method is there). – deceze Jun 21 '22 at 09:47

1 Answers1

2

I'm not sure what your use case is, but concat returns a new object, and is not an in-place operation. To modify the array history_1 in-place, you can try to use the following approach:

import pandas as pd

history_1 = pd.DataFrame()

class ClassA:
    def __init__(self,  history):
        self.history = history

    def add_row(self, row):
        addition = pd.DataFrame([row])
        self.history[addition.columns] = addition


class ClassB:
    def __init__(self,  history):
        self.history = history

    def add_row(self, row):
        addition = pd.DataFrame([row])
        self.history[addition.columns] = addition


class_a = ClassA(history_1)
new_row = {'r1':1, 'r2':2, 'r3':3}
class_a.add_row(new_row)

class_b = ClassB(history_1)
new_row = {'r1':1, 'r2':2, 'r3':3}
class_b.add_row(new_row)

print(history_1)
# >>>    r1  r2  r3
# >>> 0   1   2   3

Note that since you're trying to add rows, and not columns, you use the following operation:

self.history.append(addition)

EDIT: Upon reviewing the question a bit more, the .append() function is deprecated. In this case, if you want to add rows to a dataframe, you can use the following approach:

import pandas as pd

cols = ['r1', 'r2', 'r3']
history_1 = pd.DataFrame(columns = cols)

class ClassA:
    def __init__(self,  history):
        self.history = history

    def add_row(self, row):
        self.history.loc[self.history.shape[0]] = [row.get(i) for i in cols]

class ClassB:
    def __init__(self,  history):
        self.history = history

    def add_row(self, row):
        self.history.loc[self.history.shape[0]] = [row.get(i) for i in cols]

class_a = ClassA(history_1)
new_row = {'r1':1, 'r2':2, 'r3':3}
class_a.add_row(new_row)

class_b = ClassB(history_1)
new_row = {'r1':1, 'r2':2, 'r3':3}
class_b.add_row(new_row)

print(history_1)
# >>>    r1  r2  r3
# >>> 0   1   2   3
# >>> 1   1   2   3

Final edit: Apparently self.history[self.history.shape[0]] is faster than self.history[len(self.history)] according to How to add an extra row to a pandas dataframe

  • 1
    With your code it does change the global history_1 var. But it does not add but replace the row. If I run your code I only have one row but it should have 2. One add from class_a and one from class_b – noskule Jun 21 '22 at 10:08
  • I've reviewed the code a few times. Hopefully the one that is at the bottom of the answer now works the way which fits your use case! Let me know if you would like a more in-depth explaination – Steinn Hauser Magnússon Jun 21 '22 at 10:10
  • @noskule let me know if the most recent solution works – Steinn Hauser Magnússon Jun 21 '22 at 10:17
  • 1
    Yep, that works. I 'm wondering if this is the right aproach anyway. I gtet a websocket stream dict every second and store it in this df. Also I do some calclations like a MovingAverage and add it to a colum in the history df. – noskule Jun 21 '22 at 10:27
  • Interesting. If you want to take the conversation further feel free to tag me in a related new question :) – Steinn Hauser Magnússon Jun 21 '22 at 10:49