Pandas cross join dataframe and series

Question

I have a sample dataframe

df = pd.DataFrame({'foo': [1, 2, 3], 'bar': ['A', 'B', 'C']})

and a series

pd.Series([3, 4, 5, 6], name='buzz')

I want to combine them, so each series row is replicated for each df row so the resulting dataframe is equivalent to this

pd.DataFrame({
    'foo': [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3],
    'bar': ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C'],
    'buzz': [3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6]
})

I cannot use df.merge with outer join because these two do not have common columns.

It is pretty easy to do with regular python but I was wondering if there is a better pandas solution.

jezrael · Accepted Answer · 2020-12-01T13:11:31.923

2

You can add common column to both for cross join with Series.to_frame for convert Series to one column DataFrame:

df = s.to_frame().assign(a=1).merge(df.assign(a=1), on='a').drop('a', axis=1)
print (df)
    buzz  foo bar
0      3    1   A
1      3    2   B
2      3    3   C
3      4    1   A
4      4    2   B
5      4    3   C
6      5    1   A
7      5    2   B
8      5    3   C
9      6    1   A
10     6    2   B
11     6    3   C

Or:

df = df.assign(a=1).merge(s.to_frame().assign(a=1), on='a').drop('a', axis=1)
print (df)
    foo bar  buzz
0     1   A     3
1     1   A     4
2     1   A     5
3     1   A     6
4     2   B     3
5     2   B     4
6     2   B     5
7     2   B     6
8     3   C     3
9     3   C     4
10    3   C     5
11    3   C     6

Another idea is use numpy.tile and numpy.repeat:

df = (df.loc[np.tile(df.index, len(s))]
        .reset_index(drop=True)
        .assign(buzz = np.repeat(s.to_numpy(), len(df))))
print (df)
    foo bar  buzz
0     1   A     3
1     2   B     3
2     3   C     3
3     1   A     4
4     2   B     4
5     3   C     4
6     1   A     5
7     2   B     5
8     3   C     5
9     1   A     6
10    2   B     6
11    3   C     6

EDIT: If all columns are same types (not like here) is possible use @Ch3ster solution, thank you:

Using np.tile and np.repeat

df_arr = np.tile(df.to_numpy(), (len(s), 1))
s_arr = np.repeat(s.to_numpy(), len(df))
df = pd.DataFrame(
    np.column_stack([df_arr, s_arr]), columns=[*df.columns, s.name]
)

edited Dec 01 '20 at 13:11

answered Dec 01 '20 at 12:54

jezrael

822,522
95
1,334
1,252

@Ch3steR - Problem of your solution is all values are strings :( – jezrael Dec 01 '20 at 13:07
No problem reverted changes ;) Nice answer – Ch3steR Dec 01 '20 at 13:09
@Ch3steR - It was resaon for use `np.tile` with index and `loc` – jezrael Dec 01 '20 at 13:09
@Ch3steR - Thank you, also reopened, because partly dupe. – jezrael Dec 01 '20 at 13:09
@Ch3steR - I think better is include your solution to answer, thank you. – jezrael Dec 01 '20 at 13:12
Thank you. You gave too many solutions for OP to choose lol. Consider adding them to this [answer too](https://stackoverflow.com/questions/53699012/performant-cartesian-product-cross-join-with-pandas?noredirect=1&lq=1) It would be helpful for others – Ch3steR Dec 01 '20 at 13:14

Pandas cross join dataframe and series

1 Answers1