0

Say for the dataframe:

Ind  d1  d2  d3  
0      x0     x0     x0  
1      x1     x1     x1  
2      x2     x2     x2  
...  
n      xn     xn     xn

I want to add a new column such that the 1 2 3 4 5 pattern is repeated until n is reached. i.e.

Ind  d1  d2  d3  d4    
0      x0     x0     x0     1  
1      x1     x1     x1     2  
2      x2     x2     x2     3  
3      x3     x3     x3     4  
4      x4     x4     x4     5  
5      x5     x5     x5     1  
6      x6     x6     x6     2  
...  
n      xn     xn     xn     (1,2,3,4 or 5 depending on n)
cs95
  • 379,657
  • 97
  • 704
  • 746
Nick123
  • 197
  • 1
  • 10

3 Answers3

4

Setup
Consider the Pandas data frame df

np.random.seed([3,1415])

df = pd.DataFrame(
    np.random.choice(list('abcdefghij'), (12, 3)),
    columns=['d1', 'd2', 'd3']
)

df

   d1 d2 d3
0   a  c  h
1   d  i  h
2   a  g  i
3   g  a  c
4   a  e  j
5   h  d  c
6   e  d  d
7   g  h  h
8   e  f  d
9   h  f  j
10  i  h  g
11  e  h  g

Solution
Try using modulo

df.assign(d4=np.arange(len(df)) % 5 + 1)

   d1 d2 d3  d4
0   a  c  h   1
1   d  i  h   2
2   a  g  i   3
3   g  a  c   4
4   a  e  j   5
5   h  d  c   1
6   e  d  d   2
7   g  h  h   3
8   e  f  d   4
9   h  f  j   5
10  i  h  g   1
11  e  h  g   2

Extended Solution
Easy to apply to repeating of anything. Suppose I had an array of words a

a = np.array(['one', 'six', 'foot', 'red', 'big'])

df.assign(d4=a[np.arange(len(df)) % len(a)])

   d1 d2 d3    d4
0   a  c  h   one
1   d  i  h   six
2   a  g  i  foot
3   g  a  c   red
4   a  e  j   big
5   h  d  c   one
6   e  d  d   six
7   g  h  h  foot
8   e  f  d   red
9   h  f  j   big
10  i  h  g   one
11  e  h  g   six
piRSquared
  • 285,575
  • 57
  • 475
  • 624
4

Using @piRsquared's data,

df['new'] = 0
np.put(df['new'], np.arange(len(df)), [1,2,3,4,5])

df is now:

   d1 d2 d3  new
0   i  j  i    1
1   a  e  a    2
2   d  j  i    3
3   i  a  a    4
4   c  j  d    5
5   i  a  h    1
6   h  d  c    2
7   a  e  a    3
8   a  f  i    4
9   a  h  f    5
10  d  b  d    1
11  b  c  c    2
William Miller
  • 9,839
  • 3
  • 25
  • 46
BENY
  • 317,841
  • 20
  • 164
  • 234
2

Option 1
np.tile:

df

    A   B   C
0  13  11   2
1   8   8   6
2   7   6  13
3  13  16   4
4   3   1   3
5   2  27   9
6  20   1   2
7   5   3   9
8   0  10   1
9   1   7   4

np.tile(np.arange(1, 6), len(df) // 5 + 1)[:len(df)]
array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5])

Assign the result to a column, and you're good to go.


Option 2
groupby + cumcount:

df.groupby(df.index // 5 * 5).cumcount() + 1

0    1
1    2
2    3
3    4
4    5
5    1
6    2
7    3
8    4
9    5
dtype: int64
cs95
  • 379,657
  • 97
  • 704
  • 746
  • why 5+1 in `np.tile(np.arange(1, 6), len(df) // 5 + 1)[:len(df)]` – Pyd Nov 13 '17 at 04:46
  • @pyd Sorry? Am I missing something? – cs95 Nov 13 '17 at 04:47
  • No, I want to know why you're using 5+1. want to know whats happening in that line, pls explain – Pyd Nov 13 '17 at 05:01
  • 1
    @pyd A little bit of mathemagic. I want to account for situations where the length of the dataframe isn't a perfect multiple of 5. If it is, then `np.tile(np.arange(1, 6), len(df) // 5)` is enough. But if it isn't, then you'll run into length mismatches. I'm trying to account for that. – cs95 Nov 13 '17 at 05:02
  • but `np.tile(np.arange(1,6), len(df))[:len(df)]` itself gives the same output right ? – Pyd Nov 13 '17 at 05:05
  • 1
    @pyd If `len(df)` is 51, then `np.tile` should give me 55 = `5 * (50 // 5 + 1 = 11)` from which I only take the first 51. Note that I only repeat 11 times (that much is sufficient), not 51 times. Hope it's clear now. – cs95 Nov 13 '17 at 05:08