0

I have a df that looks like this and I want to have only 1 unique datetime index per row. There are 5 unique cat_idx values and 3 unique dt_idx values. The new df will need to have 15 columns. (5 unique cat_idx * 3 unique dt_idx )

            cat_idx         dt_idx  val
per_idx         
2002-01-01  APERMITE        TOTAL   1665.0
2002-01-01  APERMITE        SINGLE  1285.0
2002-01-01  APERMITE        MULTI   311.0
2002-01-01  AUTHNOTSTD      TOTAL   139.0
2002-01-01  AUTHNOTSTD      SINGLE  89.0
2002-01-01  AUTHNOTSTD      MULTI   46.0
2002-01-01  ASTARTS         TOTAL   1698.0
2002-01-01  ASTARTS         SINGLE  1318.0
2002-01-01  ASTARTS         MULTI   311.0
2002-01-01  UNDERCONST      TOTAL   996.0
2002-01-01  UNDERCONST      SINGLE  669.0
2002-01-01  UNDERCONST      MULTI   297.0
2002-01-01  ACOMPLETIONS    TOTAL   1632.0
2002-01-01  ACOMPLETIONS    SINGLE  1324.0
2002-01-01  ACOMPLETIONS    MULTI   273.0
2002-02-01  APERMITE        TOTAL   1787.0
2002-02-01  APERMITE        SINGLE  1401.0

My desired output df would have 1 unique datetime per row.

                      APERMITE_SINGLE  APERMITE_MULTI  APERMITE_TOT  AUTHNOTSTD_SINGLE  AUTHNOTSTD_MULTI  AUTHNOTSTD_TOT  ASTARTS_SINGLE  ASTARTS_MULTI  ASTARTS_TOT  UNDERCONST_SINGLE  UNDERCONST_MULTI UNDERCONST_TOT  ACOMPLETIONS_SINGLE  ACOMPLETIONS_MULTI  ACOMPLETIONS_TOT 
    per_idx       
    2002-01-01        1285.0           311.0           1665.0        89.0               46.0              139.0           1318.0          311.0          1698.0       669.0              297.0            996.0           1324.0               297.0               1632.0

This sample of data has only one datetime stamp, I will need to iterate through the datetime indexes and make similar rows per datetime. I'm not sure how to go about this but after searching for similar questions this one and this question had similar values or names whereas I need new columns to be created.

Also, if the title can be improved please let me know as I'm not entirely sure how to phrase this question.

test-3879
  • 3
  • 2

2 Answers2

0

I created a dataframe from the string you posted on top, this isn't identical, because I removed the multi-level index:

data="""per_idx   cat_idx        dt_idx   val
2002-01-01  APERMITE        TOTAL   1665.0
2002-01-01  APERMITE        SINGLE  1285.0
2002-01-01  APERMITE        MULTI   311.0
2002-01-01  AUTHNOTSTD      TOTAL   139.0
2002-01-01  AUTHNOTSTD      SINGLE  89.0
2002-01-01  AUTHNOTSTD      MULTI   46.0
2002-01-01  ASTARTS         TOTAL   1698.0
2002-01-01  ASTARTS         SINGLE  1318.0
2002-01-01  ASTARTS         MULTI   311.0
2002-01-01  UNDERCONST      TOTAL   996.0
2002-01-01  UNDERCONST      SINGLE  669.0
2002-01-01  UNDERCONST      MULTI   297.0
2002-01-01  ACOMPLETIONS    TOTAL   1632.0
2002-01-01  ACOMPLETIONS    SINGLE  1324.0
2002-01-01  ACOMPLETIONS    MULTI   273.0
2002-02-01  APERMITE        TOTAL   1787.0
2002-02-01  APERMITE        SINGLE  1401.0"""
lines = data.split("\n")
df = pd.DataFrame(np.array( [ re.split(r'\s{2,}', line) for line in lines[1:] ] ), 
                  columns = lines[0].split())

now that we have the dataframe, the first step is to create the column names by concatenation cat_idx and dt_idx columns:

df['col_names'] = df["cat_idx"]+ "_" + df["dt_idx"]

we remove the now unuseful columns:

df = df.loc[:, ["per_idx", "col_names", "val"]]

we use the pd.pivot to get the output:

df.pivot(index='per_idx', columns='col_names', values='val')
ESDAIRIM
  • 611
  • 4
  • 12
  • This also helped, thank you. I prefer the above solution as I will be using this for a variety of different looking dataframes. Much appreciated! – test-3879 Mar 07 '21 at 11:35
0

Just use this:-

newdf=df.pivot_table(index='per_idx',columns=['cat_idx','dt_idx'],values='val').dropna()

Then just rename columns via list comprehension and f-strings:

newdf.columns=[f"{x}_{y}" for x,y in newdf]

Now if you will print newdf then you will get your desired output

Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41