0

I want to pivot my dataframe to create boxplots:

long_dropindex
Out[183]: 
                           experiment round   kali   pH   sensor  measurement
0     4.01-1 Duracal older R1/4-1_old    R1  kali1  pH1  sensor1        313.2
1     4.01-1 Duracal older R1/4-1_old    R1  kali1  pH1  sensor1        313.2
2     4.01-1 Duracal older R1/4-1_old    R1  kali1  pH1  sensor1        313.2
3     4.01-1 Duracal older R1/4-1_old    R1  kali1  pH1  sensor1        313.1
4     4.01-1 Duracal older R1/4-1_old    R1  kali1  pH1  sensor1        313.2
                              ...   ...    ...  ...      ...          ...
7775  4.01-1 Duracal older R1/4-1_old    R1  kali5  pH9  sensor5       -126.0
7776  4.01-1 Duracal older R1/4-1_old    R1  kali5  pH9  sensor5       -126.0
7777  4.01-1 Duracal older R1/4-1_old    R1  kali5  pH9  sensor5       -126.0
7778  4.01-1 Duracal older R1/4-1_old    R1  kali5  pH9  sensor5       -126.0
7779  4.01-1 Duracal older R1/4-1_old    R1  kali5  pH9  sensor5       -126.0

[7780 rows x 6 columns]

to do something like this:

df = pd.DataFrame(np.random.randn(10, 2),columns=['Col1', 'Col2'])
df['X'] = pd.Series(['A', 'A', 'A', 'A', 'A','B', 'B', 'B', 'B', 'B'])
boxplot = df.boxplot(by='X')

df
Out[184]: 
       Col1      Col2  X
0  0.618665 -2.296295  A
1  0.325424  0.004359  A
2  0.084808 -0.429033  A
3  1.010595  0.806139  A
4  0.383201 -0.253919  A
5 -0.621024 -0.201437  B
6 -1.341940  0.086131  B
7  0.508194  0.531064  B
8 -0.111557  0.632146  B
9  0.045853  0.364231  B

enter image description here

I want to pivot my sensor column to have 5 columns (sensor1, sensor2, ..., sensor5) and my measurement to be the values. I want to keep the pH column, like the X column in the example.

I think I tried all wrong combinations..I know supplying multiple values will result in a MultiIndex in the columns. from the pandas API. But I only need the columns pH, sensor and measurement and I only want to pivot sensor..

1. gives me multiindex with pH, measurement as level 0, sensor level 1

In [160]: long_dropindex[['pH', 'sensor', 'measurement']].pivot(columns='sensor')
Out[160]: 
            pH                          ... measurement                        
sensor sensor1 sensor2 sensor3 sensor4  ...     sensor2 sensor3 sensor4 sensor5
0          pH1     NaN     NaN     NaN  ...         NaN     NaN     NaN     NaN
1          pH1     NaN     NaN     NaN  ...         NaN     NaN     NaN     NaN
2          pH1     NaN     NaN     NaN  ...         NaN     NaN     NaN     NaN
3          pH1     NaN     NaN     NaN  ...         NaN     NaN     NaN     NaN
4          pH1     NaN     NaN     NaN  ...         NaN     NaN     NaN     NaN
       ...     ...     ...     ...  ...         ...     ...     ...     ...
7775       NaN     NaN     NaN     NaN  ...         NaN     NaN     NaN  -126.0
7776       NaN     NaN     NaN     NaN  ...         NaN     NaN     NaN  -126.0
7777       NaN     NaN     NaN     NaN  ...         NaN     NaN     NaN  -126.0
7778       NaN     NaN     NaN     NaN  ...         NaN     NaN     NaN  -126.0
7779       NaN     NaN     NaN     NaN  ...         NaN     NaN     NaN  -126.0

[7780 rows x 10 columns]

2. gives me multiindex with measurement as level 0, sensor as level 1 and pH level 2

In [175]: long_dropindex[['pH', 'sensor', 'measurement']].pivot(columns=['sensor', 'pH'])
Out[175]: 
       measurement                           ...                                   
sensor     sensor1                  sensor2  ... sensor4 sensor5                   
pH             pH1 pH4 pH7 pH9 pH10     pH4  ...     pH7    pH10 pH1 pH4 pH7    pH9
0            313.2 NaN NaN NaN  NaN     NaN  ...     NaN     NaN NaN NaN NaN    NaN
1            313.2 NaN NaN NaN  NaN     NaN  ...     NaN     NaN NaN NaN NaN    NaN
2            313.2 NaN NaN NaN  NaN     NaN  ...     NaN     NaN NaN NaN NaN    NaN
3            313.1 NaN NaN NaN  NaN     NaN  ...     NaN     NaN NaN NaN NaN    NaN
4            313.2 NaN NaN NaN  NaN     NaN  ...     NaN     NaN NaN NaN NaN    NaN
           ...  ..  ..  ..  ...     ...  ...     ...     ...  ..  ..  ..    ...
7775           NaN NaN NaN NaN  NaN     NaN  ...     NaN     NaN NaN NaN NaN -126.0
7776           NaN NaN NaN NaN  NaN     NaN  ...     NaN     NaN NaN NaN NaN -126.0
7777           NaN NaN NaN NaN  NaN     NaN  ...     NaN     NaN NaN NaN NaN -126.0
7778           NaN NaN NaN NaN  NaN     NaN  ...     NaN     NaN NaN NaN NaN -126.0
7779           NaN NaN NaN NaN  NaN     NaN  ...     NaN     NaN NaN NaN NaN -126.0

[7780 rows x 25 columns]

3. now I am missing my pH column..

In[176]:long_dropindex[['pH', 'sensor', 'measurement']].pivot(columns=['sensor'], values='measurement')
Out[176]: 
sensor  sensor1  sensor2  sensor3  sensor4  sensor5
0         313.2      NaN      NaN      NaN      NaN
1         313.2      NaN      NaN      NaN      NaN
2         313.2      NaN      NaN      NaN      NaN
3         313.1      NaN      NaN      NaN      NaN
4         313.2      NaN      NaN      NaN      NaN
        ...      ...      ...      ...      ...
7775        NaN      NaN      NaN      NaN   -126.0
7776        NaN      NaN      NaN      NaN   -126.0
7777        NaN      NaN      NaN      NaN   -126.0
7778        NaN      NaN      NaN      NaN   -126.0
7779        NaN      NaN      NaN      NaN   -126.0

[7780 rows x 5 columns]

please help :( I need:

sensor1 sensor2 pH
-126 -125 9
-127 -127 9
-170 -178 10
-172 -180 10

------ EDIT ------------------------------------------------

Sample Data (I cannot make it much shorter because I need the different groups for the boxplot)

In[34]: df.to_dict()
Out[34]:
{'round': {498: 'R1',
  563: 'R1',
  628: 'R1',
  693: 'R1',
  758: 'R1',
  823: 'R1',
  888: 'R1',
  953: 'R1',
  1018: 'R1',
  1083: 'R1',
  1148: 'R1',
  1213: 'R1',
  1278: 'R1',
  1343: 'R1',
  1408: 'R1',
  1473: 'R1',
  1538: 'R1',
  1603: 'R1',
  1668: 'R1',
  1733: 'R1',
  1798: 'R1',
  1863: 'R1',
  1928: 'R1',
  1993: 'R1',
  2058: 'R1',
  2123: 'R1',
  2188: 'R1',
  2253: 'R1',
  2318: 'R1',
  2383: 'R1',
  2448: 'R1',
  2513: 'R1',
  2578: 'R1'},
 'kali': {498: 'kali2',
  563: 'kali2',
  628: 'kali2',
  693: 'kali3',
  758: 'kali3',
  823: 'kali3',
  888: 'kali4',
  953: 'kali4',
  1018: 'kali4',
  1083: 'kali4',
  1148: 'kali4',
  1213: 'kali4',
  1278: 'kali4',
  1343: 'kali4',
  1408: 'kali5',
  1473: 'kali5',
  1538: 'kali5',
  1603: 'kali1',
  1668: 'kali1',
  1733: 'kali1',
  1798: 'kali1',
  1863: 'kali2',
  1928: 'kali2',
  1993: 'kali2',
  2058: 'kali2',
  2123: 'kali2',
  2188: 'kali2',
  2253: 'kali3',
  2318: 'kali3',
  2383: 'kali3',
  2448: 'kali4',
  2513: 'kali4',
  2578: 'kali4'},
 'pH': {498: 'pH4',
  563: 'pH4',
  628: 'pH4',
  693: 'pH7',
  758: 'pH7',
  823: 'pH7',
  888: 'pH9',
  953: 'pH9',
  1018: 'pH9',
  1083: 'pH9',
  1148: 'pH9',
  1213: 'pH9',
  1278: 'pH9',
  1343: 'pH9',
  1408: 'pH10',
  1473: 'pH10',
  1538: 'pH10',
  1603: 'pH4',
  1668: 'pH4',
  1733: 'pH4',
  1798: 'pH4',
  1863: 'pH7',
  1928: 'pH7',
  1993: 'pH7',
  2058: 'pH7',
  2123: 'pH7',
  2188: 'pH7',
  2253: 'pH9',
  2318: 'pH9',
  2383: 'pH9',
  2448: 'pH10',
  2513: 'pH10',
  2578: 'pH10'},
 'sensor': {498: 'sensor1',
  563: 'sensor1',
  628: 'sensor1',
  693: 'sensor1',
  758: 'sensor1',
  823: 'sensor1',
  888: 'sensor1',
  953: 'sensor1',
  1018: 'sensor1',
  1083: 'sensor1',
  1148: 'sensor1',
  1213: 'sensor1',
  1278: 'sensor1',
  1343: 'sensor1',
  1408: 'sensor1',
  1473: 'sensor1',
  1538: 'sensor1',
  1603: 'sensor2',
  1668: 'sensor2',
  1733: 'sensor2',
  1798: 'sensor2',
  1863: 'sensor2',
  1928: 'sensor2',
  1993: 'sensor2',
  2058: 'sensor2',
  2123: 'sensor2',
  2188: 'sensor2',
  2253: 'sensor2',
  2318: 'sensor2',
  2383: 'sensor2',
  2448: 'sensor2',
  2513: 'sensor2',
  2578: 'sensor2'},
 'measurement': {498: 176.4,
  563: 176.4,
  628: 176.4,
  693: -24.1,
  758: -24.1,
  823: -24.1,
  888: -128.8,
  953: -128.8,
  1018: -128.8,
  1083: -128.8,
  1148: -128.8,
  1213: -128.9,
  1278: -128.9,
  1343: -128.9,
  1408: -176.9,
  1473: -176.9,
  1538: -176.9,
  1603: 173.4,
  1668: 173.3,
  1733: 173.3,
  1798: 173.3,
  1863: -27.4,
  1928: -27.3,
  1993: -27.3,
  2058: -27.3,
  2123: -27.3,
  2188: -27.3,
  2253: -131.9,
  2318: -131.9,
  2383: -131.9,
  2448: -180.1,
  2513: -180.2,
  2578: -180.1}}

Will look like this:

df.head(5)
Out[36]: 
     round   kali   pH   sensor  measurement
498     R1  kali2  pH4  sensor1        176.4
563     R1  kali2  pH4  sensor1        176.4
628     R1  kali2  pH4  sensor1        176.4
693     R1  kali3  pH7  sensor1        -24.1
758     R1  kali3  pH7  sensor1        -24.1

Expected df:

import numpy as np

df_expected = pd.DataFrame(
{'pH4' : np.random.default_rng().normal(loc=4.01, scale=0.01, size=(6)),
'pH7' : np.random.default_rng().normal(loc=7, scale=0.01, size=(6)),
'sensor' : ['sensor1', 'sensor1', 'sensor1', 'sensor2', 'sensor2', 'sensor2'],
'round' :['R1', 'R1', 'R1', 'R1', 'R1', 'R1'],
'kali' :['kali1', 'kali1', 'kali1', 'kali2', 'kali2', 'kali2']
})

In[64]: df_expected
Out[64]: 
        pH4       pH7   sensor round   kali
0  4.003389  6.988479  sensor1    R1  kali1
1  3.987230  6.994636  sensor1    R1  kali1
2  4.000740  7.002190  sensor1    R1  kali1
3  4.010275  7.009123  sensor2    R1  kali2
4  4.008694  7.009335  sensor2    R1  kali2
5  4.015832  6.990716  sensor2    R1  kali2

and the plot:

In[65]: df_expected.boxplot(by='sensor', sharey=False)
Out[65]: 
array([<AxesSubplot: title={'center': 'pH4'}, xlabel='[sensor]'>,
       <AxesSubplot: title={'center': 'pH7'}, xlabel='[sensor]'>],
      dtype=object)

Expected plot

Shannas M
  • 11
  • 1
  • 1
    Hi! I know you spent the time to do a thorough post, but could you add an example easier to replicate? I.e. a [mcve] ? For example, give us a small sample of your input `df`, and a small sample of the expected output. For example, provide `long_dropindex.head(3).to_dict()` and a small expected output – rafaelc May 04 '23 at 16:40
  • It's almost impossible to help since the structure of your data isn't really clear. Please provide a **small** self-contained example that reproduces your problem (see [How to create a Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) and [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/14311263)). – Timus May 05 '23 at 07:48
  • added some sample data and expected data and plot – Shannas M May 09 '23 at 12:13

0 Answers0