I'm trying to build a number of dataframes from some data whose content (in terms of variables, not values) could potentially change in each row within the same dataframe.
The way I'm trying to do it now is to build a new 1-row dataframe for each new row and then append it to the existing dataframe using the append
method. This takes care of creating new columns and set the value to NaN for the existing rows.
I also tried the loc
method as suggested here, but this returns a ValueError.
In addition to this, I could have more than one thousand possible labels. So I would like to avoid to have to explicitly declare all the columns at the beginning and it's almost impossible to know which are all the columns that will be needed for a particular file without reading the whole file at least once.
I know, however, that building a dataframe line-by-line is considered a bad (if not deprecated) practice.
So, lets say my data comes from a text file somehow similar to this:
A=10,B=2
A=20,B=3
A=30,C=Batman
and I want to create a dataframe that looks like
a b c
0 10 2.0 NaN
0 20 3.0 NaN
0 30 NaN Batman
How would you suggest to do it?
EDIT: The data comes from a very messy fixed width text. Each line of the text file is a continuous sequence of chars (no delimiter). Inside the line there are 3 letters identifier that marks the beginning of a section, followed by the values for that section all together. I have a document, that I translated into a python dict, that tells me for each identifier how many chars I need to read after then beginning of the section and how they are divided.
Eg. One line could be
AAA1234BBB789aa78CCC123456
I would then know that section AAA
is follower by 3 values, one made of a 2 digits int, and two made of one digit int. That section BBB
is followed by a 3 digit int, a 2 char string and 2 one-digit ints.
I have a piece of code that translate this into a dict that looks like
{'AAA_1': 12, 'AAA_2':3, 'AAA_3':4, 'BBB_1':789, 'BBB_2':aa,'BBB_3':7, 'BBB_4':8, ......}
EDIT2: If you want to have a glimpse of an original file, you can look here (any of them will work):
ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2017/
And to understand how to read it look here (didn't want to ask you so much):
ftp://ftp.ncdc.noaa.gov/pub/data/noaa/ish-format-document.pdf