How to use Pandas dataframe transform 2D dataset into 3D?

Question

I have a tropical cyclone dataset like this: TC dataset

I want to group the dataset according to the column 'TC_name', i.e. one TC one group, but 'TC_name' may be identical for different years and the same TC may travel interannually, so groupby() maybe doesn't work.

I also tried to loop line by line and find where the 'TC_name' changes, cut the dataset there to get a dataframe for each TC. However, this gives me over 1000 dataframes and I don't know how to represent this 2D information in a 3D formation.

I guess multi index might work through assigning a high level index to each TC, but I cannot reach it after trying many times.

An example is provided below for clarification: (I need seperate 3 tropical cyclones)

import pandas as pd
data = {'TC_name':['RITA','RITA','IDA','IDA','IDA','IDA','RITA','RITA','RITA'],
        'Year':[2020,2020,2020,2020,2020,2021,2021,2021,2021]}
df = pd.DataFrame(data)
df

    TC_name Year
0   RITA    2020
1   RITA    2020
2   IDA     2020
3   IDA     2020
4   IDA     2020
5   IDA     2021
6   RITA    2021
7   RITA    2021
8   RITA    2021

And it would be better to get this format, or any other efficient 3D data structure representation.

         TC_name    Year
   1   1    RITA    2020
       2    RITA    2020
   2   1    IDA     2020
       2    IDA     2020
       3    IDA     2020
       4    IDA     2021
   3   1    RITA    2021
       2    RITA    2021
       3    RITA    2021

Thanks a lot for your help!

Why does `RITA (2020)` get number 1 assigned and `IDA (2020)` number 2 and not vice-versa? Is there some kind of implied order in the dataset? — notiv, Sep 22 '22 at 17:41
The data is initially sorted by the time of occurrence. Actually, the order is not important. I just give a possible output by which I can access one TC track quickly. — Feng Hu, Sep 22 '22 at 17:53

notiv · Accepted Answer · 2022-09-22T19:04:42.427

1

Edit

Here another possibility. We reset the index to keep the original sorting:

df = df.reset_index()

Then we assign a new id within each TC_name group:

df['id2'] = df.groupby('TC_name')['index'].transform(lambda x: pd.factorize(x)[0] + 1)

Here the intermediate output:

	index	TC_name	Year	id2
0	0	RITA	2020	1
1	1	RITA	2020	2
2	2	IDA	2020	1
3	3	IDA	2020	2
4	4	IDA	2020	3
5	5	IDA	2021	4

Then we assign a new id every time the TC_name changes:

df['id'] = (df["TC_name"] != df["TC_name"].shift(1)).cumsum()

And finally we drop the index and set the multiindex:

df = df.drop('index', axis=1).set_index(['id', 'id2'])

Here the final output, which looks like yours:

		TC_name	Year
1	1	RITA	2020
	2	RITA	2020
2	1	IDA	2020
	2	IDA	2020
	3	IDA	2020
	4	IDA	2021
3	3	RITA	2021
	4	RITA	2021
	5	RITA	2021

edited Sep 22 '22 at 19:04

answered Sep 22 '22 at 17:56

notiv

441
5
12

Thank you very much for your answer. However, please also note that there are `IDA(2020)` and `IDA(2021)`, so `groupby` is not applicable for this issue. Any other insights on how to split this dataset are also appreciated. – Feng Hu Sep 22 '22 at 18:09
In fact, I tried `groupby()` before asking. Adding `sort = False` in `groupby` will give the output in the original sort, which is almost there. – Feng Hu Sep 22 '22 at 18:16
I edited the answer to provide another way. Do you need a separate sorting for the second level of the index (reset the "counter")? – notiv Sep 22 '22 at 18:22
Thanks, but there are actually only 3 groups. Although `IDA` begins in 2020 and ends in 2021, it is still 'one' cyclone. – Feng Hu Sep 22 '22 at 18:33
I think I have it (finally) :) – notiv Sep 22 '22 at 19:06
1

This is the first time I use Stackoverflow and I am just a beginner in python programming. Really cannot believe I find a fantastic platform to support my programming study. Thanks again. Have a nice day! :) – Feng Hu Sep 22 '22 at 19:33

How to use Pandas dataframe transform 2D dataset into 3D?

1 Answers1

Edit