76

Is there a Python function similar to the expand.grid() function in R ? Thanks in advance.

(EDIT) Below are the description of this R function and an example.

Create a Data Frame from All Combinations of Factors

Description:

     Create a data frame from all combinations of the supplied vectors
     or factors.  

> x <- 1:3
> y <- 1:3
> expand.grid(x,y)
  Var1 Var2
1    1    1
2    2    1
3    3    1
4    1    2
5    2    2
6    3    2
7    1    3
8    2    3
9    3    3

(EDIT2) Below is an example with the rpy package. I would like to get the same output object but without using R :

>>> from rpy import *
>>> a = [1,2,3]
>>> b = [5,7,9]
>>> r.assign("a",a)
[1, 2, 3]
>>> r.assign("b",b)
[5, 7, 9]
>>> r("expand.grid(a,b)")
{'Var1': [1, 2, 3, 1, 2, 3, 1, 2, 3], 'Var2': [5, 5, 5, 7, 7, 7, 9, 9, 9]}

EDIT 02/09/2012: I'm really lost with Python. Lev Levitsky's code given in his answer does not work for me:

>>> a = [1,2,3]
>>> b = [5,7,9]
>>> expandgrid(a, b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in expandgrid
NameError: global name 'itertools' is not defined

However the itertools module seems to be installed (typing from itertools import * does not return any error message)

Stéphane Laurent
  • 75,186
  • 15
  • 119
  • 225
  • 7
    The people most likely to help are Python users. Since they may not be familiar with R, perhaps you could provide a summary of what `expand.grid` does? Maybe even a small example? – GSee Aug 26 '12 at 14:26
  • Also, the `expand.grid` function operates on factors and returns a data frame, neither of which are built-in data types in Python. What are the equivalents you're interested in working with (for example, does it take 1d lists and return a 2d list? – David Robinson Aug 26 '12 at 14:35
  • 1
    @DavidRobinson The pandas Python package handles objects very close to R dataframes. Ideally, I would like such an object. – Stéphane Laurent Aug 26 '12 at 14:39
  • 1
    Looks like it's basically a Cartesian product, so if you don't find a standard solution, it shouldn't be too hard to implement it with [`itertools.product`](http://docs.python.org/library/itertools.html#itertools.product). – Lev Levitsky Aug 26 '12 at 14:43
  • @LevLevitsky: `product` appears to be a "standard solution", and would probably make a good answer to OP's question. – Joel Cornett Aug 26 '12 at 14:51
  • @JoelCornett: I meant "standard" as directly applicable to the desired kind of data structures and also returning a specific kind of structure. – Lev Levitsky Aug 26 '12 at 15:16
  • 2
    One bummer is that this question used a two variable example, but R's `expand.grid` is so much more powerful. I'd use it to quickly spit out huge arrays of complex factor levels. As a result, several answers are geared toward solving the `(x, y)` output case vs. something that works for any `n` inputs. – Hendy Jan 22 '19 at 16:20
  • 1
    @Hendy `itertools.product` also works with 3 or more vectors. See my example under @Ahmed's answer. – Paul Rougieux Feb 24 '20 at 17:07

11 Answers11

53

Just use list comprehensions:

>>> [(x, y) for x in range(5) for y in range(5)]

[(0, 0), (0, 1), (0, 2), (0, 3), (0, 4), (1, 0), (1, 1), (1, 2), (1, 3), (1, 4), (2, 0), (2, 1), (2, 2), (2, 3), (2, 4), (3, 0), (3, 1), (3, 2), (3, 3), (3, 4), (4, 0), (4, 1), (4, 2), (4, 3), (4, 4)]

convert to numpy array if desired:

>>> import numpy as np
>>> x = np.array([(x, y) for x in range(5) for y in range(5)])
>>> x.shape
(25, 2)

I have tested for up to 10000 x 10000 and performance of python is comparable to that of expand.grid in R. Using a tuple (x, y) is about 40% faster than using a list [x, y] in the comprehension.

OR...

Around 3x faster with np.meshgrid and much less memory intensive.

%timeit np.array(np.meshgrid(range(10000), range(10000))).reshape(2, 100000000).T
1 loops, best of 3: 736 ms per loop

in R:

> system.time(expand.grid(1:10000, 1:10000))
   user  system elapsed 
  1.991   0.416   2.424 

Keep in mind that R has 1-based arrays whereas Python is 0-based.

Community
  • 1
  • 1
Thomas Browne
  • 23,824
  • 32
  • 78
  • 121
  • 3
    One of the best answers so far: Pythonic, fast, and doesn't require defining custom functions! – Jealie Jan 19 '17 at 16:33
  • This turned my boolean array into an int column (the other column is int). – Max Ghenis Dec 19 '17 at 05:37
  • @Max Ghenis for heterogenous data types in the arrays you'll have to stick to simple list comprehensions. Pandas was basically invented for having columns of different data types (its main data structure was inspired by the eponymous R Dataframe which allows this). Numpy is not so friendly and will coerce your variables to the same type. Or use the itertools solution outlined below. – Thomas Browne Dec 27 '17 at 20:47
  • 3
    I used R's `expand.grid` for more complicated interactions. I like this answer, but it becomes unwieldy for more combinations. Is there a way to keep the gist of this, but abstract for use with any number of inputs? – Hendy Jan 22 '19 at 16:17
  • 1
    @Hendy, use numpy.meshgrid, then https://stackoverflow.com/questions/12864445/how-to-convert-the-output-of-meshgrid-to-the-corresponding-array-of-points – Thomas Browne Jan 14 '21 at 18:27
31

product from itertools is the key to your solution. It produces a cartesian product of the inputs.

from itertools import product

def expand_grid(dictionary):
   return pd.DataFrame([row for row in product(*dictionary.values())], 
                       columns=dictionary.keys())

dictionary = {'color': ['red', 'green', 'blue'], 
              'vehicle': ['car', 'van', 'truck'], 
              'cylinders': [6, 8]}

>>> expand_grid(dictionary)
    color  cylinders vehicle
0     red          6     car
1     red          6     van
2     red          6   truck
3     red          8     car
4     red          8     van
5     red          8   truck
6   green          6     car
7   green          6     van
8   green          6   truck
9   green          8     car
10  green          8     van
11  green          8   truck
12   blue          6     car
13   blue          6     van
14   blue          6   truck
15   blue          8     car
16   blue          8     van
17   blue          8   truck
Alexander
  • 105,104
  • 32
  • 201
  • 196
  • Nice but ultra slow by comparison with numpy meshgrid, and no faster than list comprehensions. For 3000x3000 I get 4.7 seconds for np.array(list(product(range(3000), range(3000)))) whereas np.meshgrid(range(3000), range(3000)) takes 81 milliseconds. List comprehensions are 6.8 seconds. At least it is terminology compatible with linear algebra though, which is nice. – Thomas Browne Aug 23 '16 at 19:38
  • For comparison, what is the timing of `[row for row in product(*dictionary.values())]`? – Alexander Aug 23 '16 at 19:40
  • d = {1: range(3000), 2: range(3000)}; %timeit [r for r in product(*d.values())] ..... answer 1.68 seconds. .... so nice winner on non-numpy! And bonus works with non-numerics. – Thomas Browne Aug 23 '16 at 19:54
  • another bonus: `expand.grid` in R gives you column names, which none of the other answers do. I was actually trying to implement a `dict` version of the accepted answer due to this. Then I scrolled down and found you'd done it! Its quite nice to get column names free. – Hendy Dec 19 '17 at 17:03
22

The pandas documentation defines an expand_grid function:

def expand_grid(data_dict):
    """Create a dataframe from every combination of given values."""
    rows = itertools.product(*data_dict.values())
    return pd.DataFrame.from_records(rows, columns=data_dict.keys())

For this code to work, you will need the following two imports:

import itertools
import pandas as pd

The output is a pandas.DataFrame which is the most comparable object in Python to an R data.frame.

Daniel Himmelstein
  • 1,759
  • 1
  • 21
  • 26
21

Here's an example that gives output similar to what you need:

import itertools
def expandgrid(*itrs):
   product = list(itertools.product(*itrs))
   return {'Var{}'.format(i+1):[x[i] for x in product] for i in range(len(itrs))}

>>> a = [1,2,3]
>>> b = [5,7,9]
>>> expandgrid(a, b)
{'Var1': [1, 1, 1, 2, 2, 2, 3, 3, 3], 'Var2': [5, 7, 9, 5, 7, 9, 5, 7, 9]}

The difference is related to the fact that in itertools.product the rightmost element advances on every iteration. You can tweak the function by sorting the product list smartly if it's important.


EDIT (by S. Laurent)

To have the same as R:

def expandgrid(*itrs): # https://stackoverflow.com/a/12131385/1100107
    """
    Cartesian product. Reversion is for compatibility with R.
    
    """
    product = list(itertools.product(*reversed(itrs)))
    return [[x[i] for x in product] for i in range(len(itrs))][::-1]
Stéphane Laurent
  • 75,186
  • 15
  • 119
  • 225
Lev Levitsky
  • 63,701
  • 20
  • 147
  • 175
  • @StéphaneLaurent Have you done `import itertools` before using `itertools.product`? Sorry, I should have included it from the beginning. – Lev Levitsky Sep 02 '12 at 08:29
18

I've wondered this for a while and I haven't been satisfied with the solutions put forward so far, so I came up with my own, which is considerably simpler (but probably slower). The function uses numpy.meshgrid to make the grid, then flattens the grids into 1d arrays and puts them together:

def expand_grid(x, y):
    xG, yG = np.meshgrid(x, y) # create the actual grid
    xG = xG.flatten() # make the grid 1d
    yG = yG.flatten() # same
    return pd.DataFrame({'x':xG, 'y':yG}) # return a dataframe

For example:

import numpy as np
import pandas as pd

p, q = np.linspace(1, 10, 10), np.linspace(1, 10, 10)

def expand_grid(x, y):
    xG, yG = np.meshgrid(x, y) # create the actual grid
    xG = xG.flatten() # make the grid 1d
    yG = yG.flatten() # same
    return pd.DataFrame({'x':xG, 'y':yG})

print expand_grid(p, q).head(n = 20)

I know this is an old post, but I thought I'd share my simple version!

Nate
  • 1,888
  • 3
  • 18
  • 26
  • 3
    And for an arbitrary number of arguments: `def expand_grid(*args): mesh = np.meshgrid(*args); return pd.DataFrame(m.flatten() for m in mesh) ` – Richard Border Sep 19 '18 at 23:52
11

From the above solutions, I did this

import itertools
import pandas as pd

a = [1,2,3]
b = [4,5,6]
ab = list(itertools.product(a,b))
abdf = pd.DataFrame(ab,columns=("a","b"))

and the following is the output

    a   b
0   1   4
1   1   5
2   1   6
3   2   4
4   2   5
5   2   6
6   3   4
7   3   5
8   3   6
Ahmed Attia
  • 153
  • 1
  • 8
  • 1
    Thanks, `itertools.product` also works fine with 3 vectors: `numpy.array(list(itertools.product([0,1], [0,1], [0,1])))`. – Paul Rougieux Feb 24 '20 at 17:04
  • Hi Paul, do you know any solution to use the last list to weight the edge between the first two vectors? Thank you – BlindSide Feb 10 '22 at 20:34
6

The ParameterGrid function from Scikit do the same as expand_grid(from R). Example:

from sklearn.model_selection import ParameterGrid
param_grid = {'a': [1,2,3], 'b': [5,7,9]}
expanded_grid = ParameterGrid(param_grid)

You can access the content transforming it into a list:

list(expanded_grid))

output:

[{'a': 1, 'b': 5},
 {'a': 1, 'b': 7},
 {'a': 1, 'b': 9},
 {'a': 2, 'b': 5},
 {'a': 2, 'b': 7},
 {'a': 2, 'b': 9},
 {'a': 3, 'b': 5},
 {'a': 3, 'b': 7},
 {'a': 3, 'b': 9}]

Acessing the elements by index

list(expanded_grid)[1]

You get something like this:

{'a': 1, 'b': 7}

Just adding some usage...you can use a list of dicts like the one printed above to pass to a function with **kwargs. Example:

def f(a,b): return((a+b, a-b))
list(map(lambda x: f(**x), list(expanded_grid)))

Output:

[(6, -4),
 (8, -6),
 (10, -8),
 (7, -3),
 (9, -5),
 (11, -7),
 (8, -2),
 (10, -4),
 (12, -6)]
4

Here's another version which returns a pandas.DataFrame:

import itertools as it
import pandas as pd

def expand_grid(*args, **kwargs):
    columns = []
    lst = []
    if args:
        columns += xrange(len(args))
        lst += args
    if kwargs:
        columns += kwargs.iterkeys()
        lst += kwargs.itervalues()
    return pd.DataFrame(list(it.product(*lst)), columns=columns)

print expand_grid([0,1], [1,2,3])
print expand_grid(a=[0,1], b=[1,2,3])
print expand_grid([0,1], b=[1,2,3])
snth
  • 5,194
  • 4
  • 39
  • 48
4

pyjanitor's expand_grid() is arguably the most natural solution, especially if you come from an R background.

Usage is that you set the others argument to a dictionary. The items in the dictionary can have different lengths and types. The return value is a pandas DataFrame.

import janitor as jn

jn.expand_grid(others = {
    'x': range(0, 4),
    'y': ['a', 'b', 'c'],
    'z': [False, True]
})
Richie Cotton
  • 118,240
  • 47
  • 247
  • 360
0

Have you tried product from itertools? Quite a bit easier to use than some of these methods in my opinion (with the exception of pandas and meshgrid). Keep in mind that this setup actually pulls all the items from the iterator into a list, and then converts it to an ndarray so be careful with higher dimensions or remove np.asarray(list(combs)) for higher dimensional grids unless you want to run out of memory, you can then refer to the iterator for specific combinations. I highly recommend meshgrid for this though:

#Generate square grid from axis
from itertools import product
import numpy as np
a=np.array(list(range(3)))+1 # axis with offset for 0 base index to 1
points=product(a,repeat=2) #only allow repeats for (i,j), (j,i) pairs with i!=j
np.asarray(list(points))   #convert to ndarray

And I get the following output from this:

array([[1, 1],
   [1, 2],
   [1, 3],
   [2, 1],
   [2, 2],
   [2, 3],
   [3, 1],
   [3, 2],
   [3, 3]])
ThisGuyCantEven
  • 1,095
  • 12
  • 21
0

Here is a solution for an arbitrary number of heterogeneous column types. It's based on numpy.meshgrid. Thomas Browne's answer works for homogenous column types. Nate's answer works for two columns.

import pandas as pd
import numpy as np

def expand_grid(*xi, columns=None):
    """Expand 1-D arrays xi into a pd.DataFrame
    where each row is a unique combination of the xi.
    
    Args:
        x1, ..., xn (array_like): 1D-arrays to expand.
        columns (list, optional): Column names for the output
            DataFrame.
    
    Returns:
        Given vectors `x1, ..., xn` with lengths `Ni = len(xi)`
        a pd.DataFrame of shape (prod(Ni), n) where rows are:
        x1[0], x2[0], ..., xn-1[0], xn[0]
        x1[1], x2[0], ..., xn-1[0], xn[0]
        ...
        x1[N1 -1], x2[0], ..., xn-1[0], xn[0]
        x1[0], x2[1], ..., xn-1[0], xn[0]
        x1[1], x2[1], ..., xn-1[0], xn[0]
        ...
        x1[N1 - 1], x2[N2 - 1], ..., xn-1[Nn-1 - 1], xn[Nn - 1]
    """
    if columns is None:
        columns = pd.RangeIndex(0, len(xi))
    elif columns is not None and len(columns) != len(xi):
        raise ValueError(
            " ".join(["Expecting", str(len(xi)), "columns but", 
                str(len(columns)), "provided instead."])
        )
    return pd.DataFrame({
        coln: arr.flatten() for coln, arr in zip(columns, np.meshgrid(*xi))
    })
James Baye
  • 43
  • 5