0

I have 180 000 pandas Series that I would need to combine into one DataFrame. Adding them one by one takes a lot of time, apparently because appending gets increasingly slower when the size of the dataframe increases. The same problem persists even if I use numpy which is faster than Pandas in this.

What could be an even better way to create a DataFrame from the Series?

Edit: Some more background info. The Series were stored in a list. It is sports data, and the list was called player_library with 180 000 + items. I didn't realise that it is enough to write just

pd.concat(player_library, axis=1) 

instead of listing all the individual items. Now it works fast and nicely.

MattiH
  • 554
  • 5
  • 9
  • Must you have them as a dataframe? could you modify your later code to take separate series? or perhaps go back to where the series were assigned and try to populate a dataframe from the start – RichieV Sep 05 '20 at 18:20

2 Answers2

2

You could try pd.concat instead of append.

If you want each series to be a column then

df = pd.concat([list_of_series_objects], axis=1)

For more detail on why it is expensive to iterate and append read this question

RichieV
  • 5,103
  • 2
  • 11
  • 24
-1

Input-

series = pd.Series(["BMW", "Toyota", "Honda"]) series

Output-

0 BMW

1 Toyota

2 Honda

dtype: object

Input-

colours = pd.Series(["Red", "Blue", "White"]) colours

Output-

0 Red

1 Blue

2 White

dtype: object

Input-

car_data = pd.DataFrame({"Car make": series, "Colour": colours}) car_data

Output-

Car make Colour
0 BMW Red
1 Toyota Blue
2 Honda White
  • Thanks a bunch for contributing by answering this question! As it stands right now, your post needs some formatting improvements. Try to edit your post to make in more readable with the information you can find [here](https://stackoverflow.com/help/formatting). – Koedlt Dec 13 '22 at 12:39