Pandas: reformatting CSV data from single column into multiple new columns

Question

I'm relatively new to python and have been using Pandas to manipulate scientific data. I have 79 datasets in CSV format of inconsistent satellite imagery of pixel values (lots of NaNs) that have been averaged to bi-monthly values (two months averaged together). The data is formatted similar to the the example data frame "df". The actual time series data extends from 1985-2020 with a screen shot at the bottom showing it's actual format for reference.

df =  pd.DataFrame({'grouping': ['F-M', 'A-M', 'J-J', 'A-S', 'O-N', 'D-J', 'F-M', 'A-M', 'J-J', 'A-S', 'O-N', 'D-J'],
                    'year': ['1985', '1985','1985','1985','1985','1985', '1986','1986','1986','1986','1986','1986'],
                    'region_1': ['NaN', 0.264, 0.339, 0.321, 0.305, 'NaN', 'NaN', 0.404, 0.206, 0.217, 0.266, 0.217 ],
                    'region_2': ['NaN', 0.457, 0.649,  0.625, 0.531, 'NaN', 0.503, 0.656, 0.437, 0.568, 0.547, 'NaN' ]})

I need to reformat the data so each row is just one year with each two month grouping as a column header. However, each dataset has two regions that need to be compared to each other. "April-May region 1" and "April-May region 2". The final data set would look something like this:

df2 = pd.DataFrame({'year':['1985', '1986'],
                    'F-M reg_1': ['NaN', 'NaN'],
                    'A-M reg_1': [0.264, 0.404],
                    'J-J reg_1': [0.339, 0.206],
                    'A-S reg_1': [0.321, 0.217],
                    'O-N reg_1': [0.305, 0.266],
                    'D-J reg_1': ['NaN', 0.217],
                    'F-M reg_2': ['NaN', 0.503],
                    'A-M reg_2': [0.457, 0.656],
                    'J-J reg_2': [0.649, 0.437],
                    'A-S reg_2': [0.635, 0.568],
                    'O-N reg_2': [0.531, 0.547],
                    'D-J reg_2': ['NaN', 'NaN']})

I've tried using the following code, but I dont know how to include the region_2 data within the data frame. It also creates an index value and calls it "grouping" and shuffles the order of the bi-monthly grouping.

df.pivot(index='year', columns = 'grouping', values = ('region_1')).reset_index()

Would it be better to create two separate data frames for each region?

I also can't seem to find any posts that show how to do this.

See [How to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and show sample input and expected output, and code for what you've tried so far based on your own research, to make a [mcve] so that we can better understand how to help — G. Anderson, Dec 09 '20 at 19:29
If you can read this into a dataframe, then `df.to_csv("test.csv")` writes it. You can trim that file up and post it. In fact, `df = pd.read_csv(io.StringIO("""csv content"""))` using a multiline string of the csv content works. — tdelaney, Dec 09 '20 at 19:36
Probably `df.groupby()` would help. but hard to tell. See @G.Anderson comment for how to help us helping you. — itaishz, Dec 09 '20 at 19:44
Thanks for the suggestions on creating a better question @G.Anderson Post has been edited and updated with example script. — jjkennedy, Dec 09 '20 at 20:47

RootTwo · Accepted Answer · 2020-12-09T22:05:19.810

2

I think all you need to do is use a list for the values parameter:

bimonths = ['F-M', 'A-M', 'J-J', 'A-S', 'O-N', 'D-J']
df.pivot(index='year', columns = 'grouping', values = ['region_1','region_2']).reindex(bimonths, axis=1, level=1)

Output (column alignment messed up by the cut and paste):

    region_1    region_2
grouping    F-M A-M J-J A-S O-N D-J F-M A-M J-J A-S O-N D-J
year                                                
1985    NaN 0.264   0.339   0.321   0.305   NaN NaN 0.457   0.649   0.625   0.531   NaN
1986    NaN 0.404   0.206   0.217   0.266   0.217   0.503   0.656   0.437   0.568   0.547   NaN

edited Dec 09 '20 at 22:05

answered Dec 09 '20 at 21:01

RootTwo

4,288
1
11
15

Is there a way to make sure the order of the columns do not get rearranged? – jjkennedy Dec 09 '20 at 21:05
1

Reindex that level of the multiindex. I added it to the solution. – RootTwo Dec 09 '20 at 22:05
Thanks @RootTwo! works nicely and it's really simple. – jjkennedy Dec 09 '20 at 22:23

Pandas: reformatting CSV data from single column into multiple new columns

1 Answers1