-1

I have a multi indexed data frame as the attached image. Now I need to create a new column on a different data frame where each column name will be the unique room numbers.

Existing Dataframe

For example, one expected output from the code will be as following:

Expected Outcome for one column

N.B. I want to avoid for loops to save memory space and time. What would be the optimal way to get desired output ?

I have tried using for loops and could get desired output but I am not sure if it s a good idea for a large dataset. Here is the code snippet :

import numpy as np
import pandas as pd
d = np.array(['624: COUPLE , 507: DELUXE+ ,301: HONEYMOON','624:FAMILY , 
      507: FAMILY+','621:FAMILY , 517: FAMILY+','696:FAMILY , 585: 
      FAMILY+,624:FAMILY , 507: DELUXE'])
df = pd.Series(d)
df= df.str.extractall(r'(?P<room>[0-9]+):\s*(?P<grd>[^\s,]+)')

gh = df[df['room'] == '507'].index

rf = pd.DataFrame(index=range(0,4),columns=['room#507','room#624'], 
      dtype='float')


for i in range(0,rf.shape[0]):
   for j in range(0,gh.shape[0]):
     if (i == gh[j][0]):
        rf['room#507'][i] = df.grd[gh[j][0]][gh[j][1]]
  • In order for us to help you, it is necessary that you show your effort and submit data to be used to reproduce your problem. While providing an image is helpful, it doesn't allow for reproducing the issue. Please edit your question to show a minimal reproducible set. See [Minimal Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example "Minimal Reproducible Example") for details. – itprorh66 Feb 21 '21 at 20:35
  • please find the reproduceable code provided. – BN production Feb 22 '21 at 06:48
  • 1
    Does this answer your question? [How to pivot a dataframe?](https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe) – Wiktor Stribiżew Feb 22 '21 at 09:51

1 Answers1

1

Use DataFrame.reset_index with DataFrame.pivot:

df= df.str.extractall(r'(?P<room>[0-9]+):\s*(?P<grd>[^\s,]+)')

df = df.reset_index(level=1, drop=True).reset_index()

df = df.pivot('index','room','grd').add_prefix('room_')
print (df)
room    room_301 room_507 room_517 room_585 room_621 room_624 room_696
index                                                                 
0      HONEYMOON  DELUXE+      NaN      NaN      NaN   COUPLE      NaN
1            NaN  FAMILY+      NaN      NaN      NaN   FAMILY      NaN
2            NaN      NaN  FAMILY+      NaN   FAMILY      NaN      NaN
3            NaN   DELUXE      NaN  FAMILY+      NaN   FAMILY   FAMILY

Or DataFrame.set_index with Series.unstack:

df= df.str.extractall(r'(?P<room>[0-9]+):\s*(?P<grd>[^\s,]+)')

df = (df.reset_index(level=1, drop=True)
        .set_index('room', append=True)['grd']
        .unstack()
        .add_prefix('room_'))

print (df)
room   room_301 room_507 room_517 room_585 room_621 room_624 room_696
0     HONEYMOON  DELUXE+      NaN      NaN      NaN   COUPLE      NaN
1           NaN  FAMILY+      NaN      NaN      NaN   FAMILY      NaN
2           NaN      NaN  FAMILY+      NaN   FAMILY      NaN      NaN
3           NaN   DELUXE      NaN  FAMILY+      NaN   FAMILY   FAMILY
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252