How do I copy rows in a pandas DataFrame and add an id column

Question

I have a dataframe such as:

from pandas import DataFrame
import pandas as pd
x = DataFrame.from_dict({'farm' : ['A','B','A','B'], 
                         'fruit':['apple','apple','pear','pear']})

How can I copy it N times with an id, eg. to output (for N=2):

  farm  fruit  sim
0    A  apple    0
1    B  apple    0
2    A   pear    0
3    B   pear    0
0    A  apple    1
1    B  apple    1
2    A   pear    1
3    B   pear    1

I tried an approach which works on dataframes in R:

from numpy import arange
N = 2
sim_ids = DataFrame(arange(N))
pd.merge(left=x, right=sim_ids, how='left')

but this fails with the error MergeError: No common columns to perform merge on.

Thanks.

I found a related solution [here](http://stackoverflow.com/questions/13269890/cartesian-product-in-pandas) too. — Racing Tadpole, May 22 '14 at 00:29

Phillip Cloud · Answer 1 · 2014-04-28T04:12:40.377

Not sure what R is doing there, but here's a way to do what you want:

In [150]: x
Out[150]:
  farm  fruit
0    A  apple
1    B  apple
2    A   pear
3    B   pear

[4 rows x 2 columns]

In [151]: N = 2

In [152]: DataFrame(tile(x, (N, 1)), columns=x.columns).join(DataFrame({'sims': repeat(arange(N), len(x))}))
Out[152]:
  farm  fruit  sims
0    A  apple     0
1    B  apple     0
2    A   pear     0
3    B   pear     0
4    A  apple     1
5    B  apple     1
6    A   pear     1
7    B   pear     1

[8 rows x 3 columns]

In [153]: N = 3

In [154]: DataFrame(tile(x, (N, 1)), columns=x.columns).join(DataFrame({'sims': repeat(arange(N), len(x))}))
Out[154]:
   farm  fruit  sims
0     A  apple     0
1     B  apple     0
2     A   pear     0
3     B   pear     0
4     A  apple     1
5     B  apple     1
6     A   pear     1
7     B   pear     1
8     A  apple     2
9     B  apple     2
10    A   pear     2
11    B   pear     2

[12 rows x 3 columns]

Thanks! That works, except the column names are wiped - are they easy to restore? — Racing Tadpole, Apr 28 '14 at 03:35
To get the column names back, use `DataFrame(tile(x, (N, 1)), columns=x.columns).join(...)` — Racing Tadpole, Apr 28 '14 at 03:40

score 1 · Accepted Answer · answered Apr 28 '14 at 04:20

1

I might do something like:

>>> df_new = pd.concat([df]*2)
>>> df_new["id"] = df_new.groupby(level=0).cumcount()
>>> df_new
  farm  fruit  id
0    A  apple   0
1    B  apple   0
2    A   pear   0
3    B   pear   0
0    A  apple   1
1    B  apple   1
2    A   pear   1
3    B   pear   1

[8 rows x 3 columns]

answered Apr 28 '14 at 04:20

DSM

342,061
65
592
494

This looks good - thanks. I get an error on the second line, `AttributeError: 'DataFrameGroupBy' object has no attribute 'cumcount'` - am I doing something wrong? – Racing Tadpole Apr 28 '14 at 04:48
@RacingTadpole: `cumcount` was added relatively recently. What version of pandas are you using? – DSM Apr 28 '14 at 04:52
It works with version 0.13.1. A subtle difference between @Phillip Cloud's answer and yours is the initial index column, which repeats in your answer (0-3 twice, rather than 0-7 once). Is there a way to have a unique index using your approach? – Racing Tadpole Apr 28 '14 at 06:07
@RacingTadpole: sure, add `.reset_index(drop=True)`. But your example output shows a repeating index, so isn't that what you wanted? – DSM Apr 28 '14 at 10:22

How do I copy rows in a pandas DataFrame and add an id column

2 Answers2