2

I'm relatively new to python and have been using Pandas to manipulate scientific data. I have 79 datasets in CSV format of inconsistent satellite imagery of pixel values (lots of NaNs) that have been averaged to bi-monthly values (two months averaged together). The data is formatted similar to the the example data frame "df". The actual time series data extends from 1985-2020 with a screen shot at the bottom showing it's actual format for reference.

df =  pd.DataFrame({'grouping': ['F-M', 'A-M', 'J-J', 'A-S', 'O-N', 'D-J', 'F-M', 'A-M', 'J-J', 'A-S', 'O-N', 'D-J'],
                    'year': ['1985', '1985','1985','1985','1985','1985', '1986','1986','1986','1986','1986','1986'],
                    'region_1': ['NaN', 0.264, 0.339, 0.321, 0.305, 'NaN', 'NaN', 0.404, 0.206, 0.217, 0.266, 0.217 ],
                    'region_2': ['NaN', 0.457, 0.649,  0.625, 0.531, 'NaN', 0.503, 0.656, 0.437, 0.568, 0.547, 'NaN' ]})

I need to reformat the data so each row is just one year with each two month grouping as a column header. However, each dataset has two regions that need to be compared to each other. "April-May region 1" and "April-May region 2". The final data set would look something like this:

df2 = pd.DataFrame({'year':['1985', '1986'],
                    'F-M reg_1': ['NaN', 'NaN'],
                    'A-M reg_1': [0.264, 0.404],
                    'J-J reg_1': [0.339, 0.206],
                    'A-S reg_1': [0.321, 0.217],
                    'O-N reg_1': [0.305, 0.266],
                    'D-J reg_1': ['NaN', 0.217],
                    'F-M reg_2': ['NaN', 0.503],
                    'A-M reg_2': [0.457, 0.656],
                    'J-J reg_2': [0.649, 0.437],
                    'A-S reg_2': [0.635, 0.568],
                    'O-N reg_2': [0.531, 0.547],
                    'D-J reg_2': ['NaN', 'NaN']})

I've tried using the following code, but I dont know how to include the region_2 data within the data frame. It also creates an index value and calls it "grouping" and shuffles the order of the bi-monthly grouping.

df.pivot(index='year', columns = 'grouping', values = ('region_1')).reset_index()

Would it be better to create two separate data frames for each region?

I also can't seem to find any posts that show how to do this.

enter image description here

jjkennedy
  • 81
  • 7
  • 1
    See [How to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and show sample input and expected output, and code for what you've tried so far based on your own research, to make a [mcve] so that we can better understand how to help – G. Anderson Dec 09 '20 at 19:29
  • 1
    If you can read this into a dataframe, then `df.to_csv("test.csv")` writes it. You can trim that file up and post it. In fact, `df = pd.read_csv(io.StringIO("""csv content"""))` using a multiline string of the csv content works. – tdelaney Dec 09 '20 at 19:36
  • 1
    Probably `df.groupby()` would help. but hard to tell. See @G.Anderson comment for how to help us helping you. – itaishz Dec 09 '20 at 19:44
  • Thanks for the suggestions on creating a better question @G.Anderson Post has been edited and updated with example script. – jjkennedy Dec 09 '20 at 20:47

1 Answers1

2

I think all you need to do is use a list for the values parameter:

bimonths = ['F-M', 'A-M', 'J-J', 'A-S', 'O-N', 'D-J']
df.pivot(index='year', columns = 'grouping', values = ['region_1','region_2']).reindex(bimonths, axis=1, level=1)

Output (column alignment messed up by the cut and paste):

    region_1    region_2
grouping    F-M A-M J-J A-S O-N D-J F-M A-M J-J A-S O-N D-J
year                                                
1985    NaN 0.264   0.339   0.321   0.305   NaN NaN 0.457   0.649   0.625   0.531   NaN
1986    NaN 0.404   0.206   0.217   0.266   0.217   0.503   0.656   0.437   0.568   0.547   NaN
RootTwo
  • 4,288
  • 1
  • 11
  • 15