11

The tidyr::unnest method from the R language as an equivalent in pandas and it is called explode as explained in this very detailed answer. I would like to know if there is an equivalent to the ̀tidyr::nest` method.

Example R code:

library(tidyr)
iris_nested <- as_tibble(iris) %>% nest(data=-Species)

The data column is a list-column, which contains data frames (this is useful for modelling for example, when running many models).

iris_nested
# A tibble: 3 x 2
  Species              data
  <fct>      <list<df[,4]>>
1 setosa           [50 × 4]
2 versicolor       [50 × 4]
3 virginica        [50 × 4]

To access one element inside the data column:

iris_nested[1,'data'][[1]]
[...]
# A tibble: 50 x 4
   Sepal.Length Sepal.Width Petal.Length Petal.Width
          <dbl>       <dbl>        <dbl>       <dbl>
 1          5.1         3.5          1.4         0.2
 2          4.9         3            1.4         0.2
 3          4.7         3.2          1.3         0.2
 4          4.6         3.1          1.5         0.2
 5          5           3.6          1.4         0.2
 6          5.4         3.9          1.7         0.4
 7          4.6         3.4          1.4         0.3
 8          5           3.4          1.5         0.2
 9          4.4         2.9          1.4         0.2
10          4.9         3.1          1.5         0.1
# … with 40 more rows
library(tidyr)
iris_nested <- as_tibble(iris) %>% nest(data=-Species)
iris_nested
iris_nested[1,'data'][[1]]

Example python code:

import seaborn
iris = seaborn.load_dataset("iris")

How can I nest this data frame in pandas :

  1. firstly in a less complex way (on paar with the pandas explode functionality) the data column contains a simple list
  2. secondly the data column contains data frames as illustrated in the example above
Paul Rougieux
  • 10,289
  • 4
  • 68
  • 110

2 Answers2

4

I think this is the closest:

df=iris.groupby("Species").apply(lambda x:dict(x))

Output:

Species
setosa        {'Sepal.Length': [5.1, 4.9, 4.7, 4.6, 5.0, 5.4...
versicolor    {'Sepal.Length': [7.0, 6.4, 6.9, 5.5, 6.5, 5.7...
virginica     {'Sepal.Length': [6.3, 5.8, 7.1, 6.3, 6.5, 7.6...

To access one of the Species:

pd.DataFrame(df['setosa'])


     Sepal.Length  Sepal.Width  Petal.Length  Petal.Width Species
100           5.1          3.5           1.4          0.2  setosa
101           4.9          3.0           1.4          0.2  setosa
102           4.7          3.2           1.3          0.2  setosa
103           4.6          3.1           1.5          0.2  setosa
104           5.0          3.6           1.4          0.2  setosa
105           5.4          3.9           1.7          0.4  setosa
106           4.6          3.4           1.4          0.3  setosa
107           5.0          3.4           1.5          0.2  setosa
108           4.4          2.9           1.4          0.2  setosa
109           4.9          3.1           1.5          0.1  setosa
110           5.4          3.7           1.5          0.2  setosa
111           4.8          3.4           1.6          0.2  setosa
112           4.8          3.0           1.4          0.1  setosa
113           4.3          3.0           1.1          0.1  setosa
114           5.8          4.0           1.2          0.2  setosa
115           5.7          4.4           1.5          0.4  setosa
116           5.4          3.9           1.3          0.4  setosa
117           5.1          3.5           1.4          0.3  setosa
118           5.7          3.8           1.7          0.3  setosa
119           5.1          3.8           1.5          0.3  setosa
120           5.4          3.4           1.7          0.2  setosa
121           5.1          3.7           1.5          0.4  setosa
122           4.6          3.6           1.0          0.2  setosa
123           5.1          3.3           1.7          0.5  setosa
124           4.8          3.4           1.9          0.2  setosa
Billy Bonaros
  • 1,671
  • 11
  • 18
4

It's easy to do it using datar:

>>> from datar.all import f, nest
>>> from datar.datasets import iris
>>> iris_nested = iris >> nest(data=~f.Species)
>>> iris_nested
      Species       data
     <object>   <object>
0      setosa  <DF 50x4>
1  versicolor  <DF 50x4>
2   virginica  <DF 50x4>
>>> iris_nested.iloc[0, 1]
    Sepal_Length  Sepal_Width  Petal_Length  Petal_Width
       <float64>    <float64>     <float64>    <float64>
0            5.1          3.5           1.4          0.2
1            4.9          3.0           1.4          0.2
2            4.7          3.2           1.3          0.2
3            4.6          3.1           1.5          0.2
4            5.0          3.6           1.4          0.2
5            5.4          3.9           1.7          0.4
6            4.6          3.4           1.4          0.3
7            5.0          3.4           1.5          0.2
8            4.4          2.9           1.4          0.2
9            4.9          3.1           1.5          0.1
10           5.4          3.7           1.5          0.2
11           4.8          3.4           1.6          0.2
12           4.8          3.0           1.4          0.1
13           4.3          3.0           1.1          0.1
14           5.8          4.0           1.2          0.2
15           5.7          4.4           1.5          0.4
16           5.4          3.9           1.3          0.4
17           5.1          3.5           1.4          0.3
18           5.7          3.8           1.7          0.3
19           5.1          3.8           1.5          0.3
20           5.4          3.4           1.7          0.2
21           5.1          3.7           1.5          0.4
22           4.6          3.6           1.0          0.2
23           5.1          3.3           1.7          0.5
24           4.8          3.4           1.9          0.2
25           5.0          3.0           1.6          0.2
26           5.0          3.4           1.6          0.4
27           5.2          3.5           1.5          0.2
28           5.2          3.4           1.4          0.2
29           4.7          3.2           1.6          0.2
30           4.8          3.1           1.6          0.2
31           5.4          3.4           1.5          0.4
32           5.2          4.1           1.5          0.1
33           5.5          4.2           1.4          0.2
34           4.9          3.1           1.5          0.2
35           5.0          3.2           1.2          0.2
36           5.5          3.5           1.3          0.2
37           4.9          3.6           1.4          0.1
38           4.4          3.0           1.3          0.2
39           5.1          3.4           1.5          0.2
40           5.0          3.5           1.3          0.3
41           4.5          2.3           1.3          0.3
42           4.4          3.2           1.3          0.2
43           5.0          3.5           1.6          0.6
44           5.1          3.8           1.9          0.4
45           4.8          3.0           1.4          0.3
46           5.1          3.8           1.6          0.2
47           4.6          3.2           1.4          0.2
48           5.3          3.7           1.5          0.2
49           5.0          3.3           1.4          0.2

It aligns with dplyr/tidyr APIs.

I am the author of the package. Feel free to submit issues if you have any questions.

Panwen Wang
  • 3,573
  • 1
  • 18
  • 39