0

I am running analytics on edge device, to compute everything I need panda frames. Here is my problem, every 10 sec I am updating panda master dataframe with new set of rows. Some disagree with approach, it might hit performance. append is the only way I can update the rows, is there any other efficient way I can update panda frame, all I need is something like list.append(x) or list.extend(x) API in Panda. Hope I am using right API, any alternative for more efficient way ? I do not have memory issue, since I am discarding after some time.

snippet

df.append(self.__get_pd_frame(tracker_data), ignore_index=True)
# tracker_data - another panda data frame contains 100-200 rows
ajayramesh
  • 3,576
  • 8
  • 50
  • 75
  • 1
    fastest way is to use dictionary instead, https://stackoverflow.com/questions/57000903/what-is-the-fastest-and-most-efficient-way-to-append-rows-to-a-dataframe. from pandas append documentation `Iteratively appending rows to a DataFrame can be more computationally intensive than a single concatenate. A better solution is to append those rows to a list and then concatenate the list with the original DataFrame all at once.` – Shijith Sep 03 '20 at 15:17
  • @Shijith - something like this `pd.DataFrame.from_dict(dictinary_list)` ? - from the link you mentioned – ajayramesh Sep 03 '20 at 15:23
  • thinking on easy a way, I can convert 2d-array to dict easily – ajayramesh Sep 03 '20 at 15:24
  • pandas.DataFrame.from_records – ajayramesh Sep 03 '20 at 15:26

1 Answers1

0

I changed from append method to from_record API, something like below

data = np.array([[1, 3], [2, 4], [4, 5]])
pd.DataFrame.from_records(data, columns=("a", "b"))
ajayramesh
  • 3,576
  • 8
  • 50
  • 75