Converting single column to multiple columns based on unique values

Question

I have the following pandas.DataFrame with shape (1464, 2):

df = pd.DataFrame()
for name in list('ABCD'):
    temp_df = pd.DataFrame(np.random.randint(0,100,size=(len(date_rng), 1)), columns=['value'], index=date_rng)
    temp_df['name'] = name
    df = df.append(temp_df)

The index column has each data duplicated 4 times: one for each string ('ABCD') in the name column.

The dataframe head and tail look like so:

Head

value   name
2018-01-01  47  A
2018-01-02  22  A
2018-01-03  13  A
2018-01-04  66  A
2018-01-05  19  A

Tail

    value   name
2018-12-28  32  D
2018-12-29  1   D
2018-12-30  5   D
2018-12-31  50  D
2019-01-01  75  D

I would like to convert this (1464, 2) dataframe to shape (366, 4), such that each of the 4 columns are the 4 unique values in df.name.unique() (i..e A, B, C, D). The values for each column are the respective integers in the df.value column.

The final DataFrame should look something like this:

            A   B   C   D
2018-12-28  32  22  21  4
2018-12-29  1   16  2   12
2018-12-30  5   1   65  26
2018-12-31  50  92  21  75
2019-01-01  75  55  33  34

I am sure there must be a nice reindex function or something of this sort to perform the task efficiently, as opposed to looping and recreating the dataframe.

score 2 · Accepted Answer · answered Sep 27 '19 at 12:46

2

You can use this:

df.pivot(columns='name',values='value')

answered Sep 27 '19 at 12:46

Joe

12,057
5
39
55

Nice. Can also specify the `index="colname"` parameter to specify what shows up on the rows. – s2t2 Jun 21 '23 at 20:23

Dev Khadka · Answer 2 · 2019-09-27T13:03:11.013

you can do it using pivot

df = pd.DataFrame([('2018-01-01', '47', 'A'), ('2018-01-02', '22', 'A'), ('2018-01-03', '13', 'A'), ('2018-01-04', '66', 'A'), ('2018-01-05', '19', 'A'), ('2018-01-01', '32', 'D'), ('2018-01-02', '1', 'D'), ('2018-01-03', '5', 'D'), ('2018-01-04', '50', 'D'), ('2018-01-05', '75', 'D')], columns=('id', 'value', 'name'))
df["id"] = pd.to_datetime(df.id)
df.set_index("id")

df.pivot("id", columns="name")

Output

    id  value   name
0   2018-01-01  47  A
1   2018-01-02  22  A
2   2018-01-03  13  A
3   2018-01-04  66  A
4   2018-01-05  19  A
5   2018-01-01  32  D
6   2018-01-02  1   D
7   2018-01-03  5   D
8   2018-01-04  50  D
9   2018-01-05  75  D


value
name    A   D
id      
2018-01-01  47  32
2018-01-02  22  1
2018-01-03  13  5
2018-01-04  66  50
2018-01-05  19  75

It is possible to do this without ending up with a multi level index dataframe? — Newskooler, Sep 27 '19 at 13:01
it is not multiindex, it is showing column names, you can set that to blank like ```df.columns.rename("")``` — Dev Khadka, Sep 27 '19 at 13:12

Converting single column to multiple columns based on unique values

2 Answers2