I'm doing some web-scraping and I'm storing the variables of interest in form of:
a = {'b':[100, 200],'c':[300, 400]}
This is for one page, where there were two b
's and two c
's. The next page could have three of each, where I'd store them as:
b = {'b':[300, 400, 500],'c':[500, 600, 700]}
When I go to create a DataFrame
from the list of dict
's, I get:
import pandas as pd
df = pd.DataFrame([a, b])
df
b c
0 [100, 200] [300, 400]
1 [300, 400, 500] [500, 600, 700]
What I'm expecting is:
df
b c
0 100 300
1 200 400
2 300 500
3 400 600
4 500 700
I could create a DataFrame
each time I store a page and concat
the list of DataFrame
's at the end. However, based on experience, this is very expensive because the construction of thousands of DataFrame
's is much more expensive than creating one DataFrame
from a lower-level constructor (i.e., list of dict
's).