4

I have forecast data that I grouped by month. The original dataframe something like this:

>>clean_table_grouped[0:5]
       STYLE    COLOR    SIZE   FOR
MONTH                           01/17    10/16   11/16    12/16
    0 #######   ######   ####   0.0      15.0    15.0     15.0
    1 #######   ######   ####   0.0      15.0    15.0     15.0
    2 #######   ######   ####   0.0      15.0    15.0     15.0
    3 #######   ######   ####   0.0      15.0    15.0     15.0
    4 #######   ######   ####   0.0      15.0    15.0     15.0

>>clean_table_grouped.ix[0:,"FOR"][0:5] 
 MONTH  01/17  10/16  11/16  12/16
0        0.0   15.0   15.0   15.0
1        0.0   15.0   15.0   15.0
2        0.0   15.0   15.0   15.0
3        0.0   15.0   15.0   15.0
4        0.0   15.0   15.0   15.0

I simply want reorder these 4 columns in the follow way:

(keeping the rest of the dataframe untouched)

MONTH    10/16  11/16  12/16  01/17
0        15.0   15.0   15.0   0.0
1        15.0   15.0   15.0   0.0
2        15.0   15.0   15.0   0.0
3        15.0   15.0   15.0   0.0
4        15.0   15.0   15.0   0.0

My attempted solution was to reorder the columns of the subset following the post below: How to change the order of DataFrame columns?

I went about it by grabbing the column list and sorting it first

 >>for_cols = clean_table_grouped.ix[:,"FOR"].columns.tolist()
 >>for_cols.sort(key = lambda x: x[0:2])   #sort by month ascending
 >>for_cols.sort(key = lambda x: x[-2:])   #then sort by year ascending

Querying the dataframe works just fine

>>clean_table_grouped.ix[0:,"FOR"][for_cols]
MONTH   10/16   11/16  12/16  01/17
0        15.0    15.0    15.0    0.0
1        15.0    15.0    15.0    0.0
2        15.0    15.0    15.0    0.0
3        15.0    15.0    15.0    0.0
4        15.0    15.0    15.0    0.0

However, when I try to set values in the original table, I get a table of "NaN":

>>clean_table_grouped.ix[0:,"FOR"] = clean_table_grouped.ix[0:,"FOR"][for_cols]
>>clean_table_grouped.ix[0:,"FOR"]
MONTH  01/17  10/16  11/16  12/16
0        NaN    NaN    NaN    NaN
1        NaN    NaN    NaN    NaN
2        NaN    NaN    NaN    NaN
3        NaN    NaN    NaN    NaN
4        NaN    NaN    NaN    NaN
5        NaN    NaN    NaN    NaN

I have also tried zipping to avoid chained syntax (.ix[][]). This avoids the NaN, however, it doesn't change the dataframe -__-

>>for_cols = zip(["FOR", "FOR", "FOR", "FOR"], for_cols)
>>clean_table_grouped.ix[0:,"FOR"] = clean_table_grouped.ix[0:,for_cols]
>>clean_table_grouped.ix[0:,"FOR"]
 MONTH  01/17  10/16  11/16  12/16
 0        0.0   15.0   15.0   15.0
 1        0.0   15.0   15.0   15.0
 2        0.0   15.0   15.0   15.0
 3        0.0   15.0   15.0   15.0
 4        0.0   15.0   15.0   15.0

I realize I'm using ix to reassign values. However, I've used this technique in the past on dataframes that are not grouped and it has worked just fine.

If this question as already been answered in another post (in a CLEAR way), please provide the link. I searched but could not find anything similar.

EDIT: I have found a solution. Manually reindex by creating a new multiindex dataframe in the order you want your columns sorted. I posted the solution below.

Community
  • 1
  • 1
xdzzz
  • 71
  • 9

2 Answers2

1

Sort the column names containing date strings and later use it as a subset to return the columns in that particular order:

from datetime import datetime
df[sorted(df.columns, key=lambda x: datetime.strptime(x, '%m/%y'))]

Image


Toy Data:

from datetime import datetime
np.random.seed(42)

cols = [['STYLE', 'COLOR', 'SIZE', 'FOR', 'FOR', 'FOR', 'FOR'],
        ['', '', '', '01/17', '10/16', '11/16', '12/16']]
tups = list(zip(*cols))
index = pd.MultiIndex.from_tuples(tups, names=[None, 'MONTH'])
clean_table_grouped = pd.DataFrame(np.random.randint(0, 100, (100, 7)), 
                                   index=np.arange(100), columns=index)
clean_table_grouped = clean_table_grouped.head()
clean_table_grouped

Image

Split the multi-index DF into two with the one containing the forecast values and the other the remaining DF.

for_df = clean_table_grouped[['FOR']]
clean_table_grouped = clean_table_grouped.drop(['FOR'], axis=1, level=0)

Forecast DF:

for_df

Image

Remaining DF:

clean_table_grouped

Image

Sorting the columns in the forecast DF by applying the same procedure as done in the pre-edited post.

order = sorted(for_df['FOR'].columns.tolist(), key=lambda x: datetime.strptime(x, '%m/%y'))

Making the DF in the same order by subsetting the sorted list of columns.

for_df = for_df['FOR'][order]

Concatenate the forecast DF with itself to create a multi-index like column.

for_df = pd.concat([for_df, for_df], axis=1, keys=['FOR'])

Finally, join them on the common index.

clean_table_grouped.join(for_df)

Image

Nickil Maveli
  • 29,155
  • 8
  • 82
  • 85
  • This works to swap the order of columns in the *subset* of the data frame and it is a more elegant solution to swapping the column order than my own. However, my issue was in substituting that swapped order INTO the original dataframe without affecting the other columns (style, color, size). Since I did not provide the original structure of the dataframe in which I wished to update, I did not down vote this answer. Thanks! – xdzzz Oct 05 '16 at 17:43
  • Got that. Now that I know what you were asking after you've provided your starting `DF`, I've recreated a similar `DF` to give you the desired output you were looking for. *See edit* – Nickil Maveli Oct 06 '16 at 10:22
  • Nickil, solved my friend, solved. Well played. I have to say, I wasn't expecting such a convoluted solution. However, your solution works perfectly :). I'd like to note that I had other grouped data besides "FOR"; the join appends the for_df table to the very end of the original table but that's not really an issue. It does swap the "FOR" columns appropriately and joins it back to the original table in a seamless way. – xdzzz Oct 06 '16 at 20:04
  • Glad that I was able to solve it completely. But I must admit that it took me a while to figure it out and finally arrive at the solution and really enjoyed the entire process as a result of it. Thanks for the question and all the best! – Nickil Maveli Oct 06 '16 at 20:13
0

My own solution was based upon the below post's second answer: How can I reorder multi-indexed dataframe columns at a specific level

Pretty much... just create a new dataframe with the multiindex you want. Trying to insert values using .ix,.loc,.iloc isn't well supported with multiindexed dataframes. If you're looking to completely change the values of the subset of columns (not just swap), Nickil's solution of separating and re-joining the tables is definitely the way to go. However, if you're only looking to swap the columns, the below works perfectly fine. I selected this as the answer over Nickil's solution because this solution worked better for me as I had other data besides 'FOR' grouped by month and it gave me more flexibility in reordering the columns.

First, store the lists IN THE ORDER YOU WANT IT:

>>reindex_list = ['STYLE','COLOR','SIZE','FOR'] #desired order
>>month_list = clean_table_grouped.ix[0:,"FOR"].columns.tolist()
>>month_list.sort(key = lambda x: x[0:2]) #sort by month ascending
>>month_list.sort(key = lambda x: x[-2:]) #sort by year ascending

Then create a zipped listed where style, color, size get zipped with '', and 'FOR' gets zipped with each month. Like so:

[('STYLE',''),('COLOR',''),..., ('FOR','10/16'), ('FOR','11/16'), ...]

Here is an algorithm that does it automagically:

>>zip_list = []
>>
for i in reindex_list:
if i in ['FOR']:
    for j in month_list:
        if j != '':
            zip_list.append(zip([i],[j])[0])
else:
    zip_list.append(zip([i],[''])[0])

Then create a multi index from the tuple list you just zipped:

>>multi_cols = pd.MultiIndex.from_tuples(zip_list, names=['','MONTH'])

And finally, create a new dataframe from the old with the new multiindex:

>>clean_table_grouped_ordered = pd.DataFrame(clean_table_grouped, columns=multi_cols)
>>clean_table_grouped_ordered[0:5]
       STYLE COLOR SIZE FOR
 MONTH                  10/16   11/16   12/16  01/17
       ####  ####  ###  15.0    15.0    15.0    0.0
       ####  ####  ###  15.0    15.0    15.0    0.0
       ####  ####  ###  15.0    15.0    15.0    0.0
       ####  ####  ###  15.0    15.0    15.0    0.0
       ####  ####  ###  15.0    15.0    15.0    0.0
       ####  ####  ###  15.0    15.0    15.0    0.0
Community
  • 1
  • 1
xdzzz
  • 71
  • 9