0

I have a Pandas dataframe read from a CSV file that is structured like this:

x_column    y_column    number_column  
---         ----        ----
---         ----        ----
xxx         yyyy        1
xxx         yyyy        2
xxx         yyyy        35
xxx         yyyy        42

The row with the dashes represents some extra data at the start of the csv file that I want to keep.

I have a list of numbers I want to append to the 'number_column'. The list itself is 500,000 values long. I want to append the list to the column keeping the existing values for the number_column in the same place and un-altered.

I also want the values for x_column and y_column to be the same for every row that has just been added as shown in the example. My current approach is just a simple for loop that appends the values one at a time:

for num in number_list:

      data_df = data_df.append(pd.DataFrame({'x_column': 'xxx', 'y_column': 'yyy', 'number_column': num}, index=[0]), ignore_index=True)

My question is if there is a faster way of doing this? The current approach takes a long while to complete.

GreenGodot
  • 6,030
  • 10
  • 37
  • 66

1 Answers1

2

Don't call data_df = data_df.append(...) in a loop since that leads to quadratic copying, which is very bad for performance. Instead, append to a list, build one DataFrame, then concatenate it to your original DataFrame:

tmp = pd.DataFrame({'x_column': 'xxx', 'y_column': 'yyy', 'number_column': number_list})
data_df = pd.concat([data_df, tmp])
Community
  • 1
  • 1
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677