I want to create a multi-index DataFrame
by reading a textfile. Is it faster to create the multi-index and then allocate data to it from the text file using df.loc[[],[]]
, or concatenate rows to the DataFrame
and set the index of the DataFrame
at the end? Or, is it faster to use a list or dict
to store the data as it's read from the file, and then create a DataFrame
from them? Is there a more pythonic or faster option?
Example text file:
A = 1
B = 1
C data
0 1
1 2
A = 1
B = 2
C data
1 3
2 4
A = 2
B = 1
C data
0 5
2 6
Output DataFrame:
A B C data
1 1 0 1
1 2
1 2 1 3
2 4
2 1 0 5
2 6
Update Jan 18: This is linked to How to parse complex text files using Python? I also wrote a blog article explaining how to parse complex files to beginners.