0

Edit: Response to being marked a duplicate question - A pivot() may be able to do something like what I want as discussed in other questions, but it seems to require a parameter column=, which gets used for the new column names. My dataframe doesn't have an appropriate column to pass as the column parameter. I don't see a way to generate the new column names sharename1, sharename2, sharename..., from the existing column name.

I have a dataframe in this format

    usernames_id    sharename
0   2.0             APPLICATIONS
1   2.0             HOME
2   2.0             SYSVOL
3   2.0             IT
4   14.0            IT
5   14.0            DOWNLOAD
6   14.0            DATA2$
7   14.0            PRINT$
8   34.0            HOME
9   34.0            COL RECEIPT
10  34.0            DEPARTMENTS
11  34.0            APPLICATIONS
12  35.0            SYSVOL
13  36.0            DEPARTMENTS
14  36.0            APPLICATIONS

I would like to convert it into this format, having a single row for each usernames_id with a column for each sharename like this:

    usernames_id    sharename1   sharename2  sharename3  sharename4   sharename...
0   2.0             APPLICATIONS HOME        SYSVOL      IT
1   14.0            IT           DOWNLOAD    DATA2$      PRINT$
2   34.0            HOME         COL RECEIPT DEPARTMENTS APPLICATIONS
3   35.0            SYSVOL
4   36.0            DEPARTMENTS  APPLICATIONS

I have been trying with pandas functions like groupby, pivot, unstack, etc and not getting anywhere.

How does one accomplish this with pandas?

Here is code to reproduce the DataFrame:

df = pd.DataFrame({
'usernames_id': {
    0: 2.0,
    1: 2.0,
    2: 2.0,
    3: 2.0,
    4: 14.0,
    5: 14.0,
    6: 14.0,
    7: 14.0,
    8: 34.0,
    9: 34.0,
    10: 34.0,
    11: 34.0,
    12: 35.0,
    13: 36.0,
    14: 36.0
},
'sharename': {
    0: 'APPLICATIONS',
    1: 'HOME',
    2: 'SYSVOL',
    3: 'IT',
    4: 'IT',
    5: 'DOWNLOAD',
    6: 'DATA2$',
    7: 'PRINT$',
    8: 'HOME',
    9: 'COL RECEIPT',
    10: 'DEPARTMENTS',
    11: 'APPLICATIONS',
    12: 'SYSVOL',
    13: 'DEPARTMENTS',
    14: 'APPLICATIONS'
 }
})
wl2776
  • 4,099
  • 4
  • 35
  • 77
saujosai
  • 79
  • 9

1 Answers1

0

This can be achieved with first concatenating these values based on username_id into a single row that is comma separated. Then we can split these into multiple columns and then rename the columns later:

df = df.groupby('usernames_id')['sharename'].apply(",".join).reset_index()
df = pd.concat([df, df.sharename.str.split(',', expand=True)], axis=1)
cols = ["sharename_" + str(col) if type(col) == int else col for col in df.columns]
df.columns = cols
df
    usernames_id                                  sharename   sharename_0   sharename_1  sharename_2   sharename_3
0           2.0                APPLICATIONS,HOME,SYSVOL,IT  APPLICATIONS          HOME       SYSVOL            IT
1          14.0                  IT,DOWNLOAD,DATA2$,PRINT$            IT      DOWNLOAD       DATA2$        PRINT$
2          34.0  HOME,COL RECEIPT,DEPARTMENTS,APPLICATIONS          HOME   COL RECEIPT  DEPARTMENTS  APPLICATIONS
3          35.0                                     SYSVOL        SYSVOL                             
4          36.0                   DEPARTMENTS,APPLICATIONS   DEPARTMENTS  APPLICATIONS
Asad Rauf
  • 743
  • 9
  • 17