0

I'm reading in a lot of data wherein each line is encoded, with data for each month within the structure:

year1 = "AK00101Y90010002A003J0049005X006007B008B009I01A00110012\n" \
       "CA01100N600100020003A00410050006007B008B009I01A00110012"

colspecs = [(0,6),(6,7),(7,8)] + [ (4*i+8,4*i+12) for i in range(0, 12) ]
names = ['location', 'something', 'bool'] + [ f"month_{i:02}" for i in range(0, 12) ]
pd.read_fwf(StringIO(year1), colspecs=colspecs, names=names, dtype={ f"month_{i:02}":str for i in range(0, 13) })

Resulting in something like:

the df

I would then parse each month into it's own data, maybe as such:

pd.read_fwf(StringIO(df.iloc[0]['month_00']), colspecs=((0,1),(1,2),(2,3),(3,4)), names=('something','dogs','cats','cows'))

The real data will have potentially hundreds of lines and I would like to be able to query the data by location and or month. The question is how would I best construct such a frame? or should I use a panel? Should i de-normalize each month into their own rows?

Vetsin
  • 2,245
  • 1
  • 20
  • 24

0 Answers0