0

I have a sample dataframe

df = pd.DataFrame({'foo': [1, 2, 3], 'bar': ['A', 'B', 'C']})

and a series

pd.Series([3, 4, 5, 6], name='buzz')

I want to combine them, so each series row is replicated for each df row so the resulting dataframe is equivalent to this

pd.DataFrame({
    'foo': [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3],
    'bar': ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C'],
    'buzz': [3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6]
})

I cannot use df.merge with outer join because these two do not have common columns.

It is pretty easy to do with regular python but I was wondering if there is a better pandas solution.

taras
  • 6,566
  • 10
  • 39
  • 50

1 Answers1

2

You can add common column to both for cross join with Series.to_frame for convert Series to one column DataFrame:

df = s.to_frame().assign(a=1).merge(df.assign(a=1), on='a').drop('a', axis=1)
print (df)
    buzz  foo bar
0      3    1   A
1      3    2   B
2      3    3   C
3      4    1   A
4      4    2   B
5      4    3   C
6      5    1   A
7      5    2   B
8      5    3   C
9      6    1   A
10     6    2   B
11     6    3   C

Or:

df = df.assign(a=1).merge(s.to_frame().assign(a=1), on='a').drop('a', axis=1)
print (df)
    foo bar  buzz
0     1   A     3
1     1   A     4
2     1   A     5
3     1   A     6
4     2   B     3
5     2   B     4
6     2   B     5
7     2   B     6
8     3   C     3
9     3   C     4
10    3   C     5
11    3   C     6

Another idea is use numpy.tile and numpy.repeat:

df = (df.loc[np.tile(df.index, len(s))]
        .reset_index(drop=True)
        .assign(buzz = np.repeat(s.to_numpy(), len(df))))
print (df)
    foo bar  buzz
0     1   A     3
1     2   B     3
2     3   C     3
3     1   A     4
4     2   B     4
5     3   C     4
6     1   A     5
7     2   B     5
8     3   C     5
9     1   A     6
10    2   B     6
11    3   C     6

EDIT: If all columns are same types (not like here) is possible use @Ch3ster solution, thank you:

Using np.tile and np.repeat

df_arr = np.tile(df.to_numpy(), (len(s), 1))
s_arr = np.repeat(s.to_numpy(), len(df))
df = pd.DataFrame(
    np.column_stack([df_arr, s_arr]), columns=[*df.columns, s.name]
)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252