2

I have a dataframe

City  hits
A     10
B     1
C     22
D     122
E     1
F     165

I want to make a dataframe with a single column called Hit_ID, whose rows are

A_1
A_2
etc.
A_10
B_1
C_1
C_2 
etc.
D_124
D_125
E_1

I tried df['Hit_ID'] = df[['City','hits']].apply(lambda x:'_'.join(x.astype(str)),axis=1) but this just concatenates the columns. How can I modify the apply function to get the desired op?

ASGM
  • 11,051
  • 1
  • 32
  • 53
Ssank
  • 3,367
  • 7
  • 28
  • 34

2 Answers2

4

You need:

#repeat index values by column hits
df = df.loc[df.index.repeat(df['hits'])].reset_index(drop=True)
#create counts
df['hits'] = df.groupby('City').cumcount() + 1
#concatenate columns
df['Hit_ID'] = df['City'] + '_' + df['hits'].astype(str)
print (df.head(20))
   City  hits Hit_ID
0     A     1    A_1
1     A     2    A_2
2     A     3    A_3
3     A     4    A_4
4     A     5    A_5
5     A     6    A_6
6     A     7    A_7
7     A     8    A_8
8     A     9    A_9
9     A    10   A_10
10    B     1    B_1
11    C     1    C_1
12    C     2    C_2
13    C     3    C_3
14    C     4    C_4
15    C     5    C_5
16    C     6    C_6
17    C     7    C_7
18    C     8    C_8
19    C     9    C_9
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • This is not the desired output. The number of rows for City A should equal the number of Hits. i.e. the rows should be A_1,A_2,A_3,A_4,A_5,A_6,A_7,A_8,A_9,A_10 and so on – Ssank Oct 19 '17 at 15:11
  • Sorry, Idont understand it. Now solution was changed. – jezrael Oct 19 '17 at 15:16
  • There is a tiny bug in this solution which actually doesn't give you the desired result. When you create the counter you should `groupby` by City not by hits. In the OP City *B* and *E* had the same number of hits which results in B_1 and E_2. And for joining the strings you could also use `df['Hit_ID'] = df.astype(str).apply("_".join, axis=1)` Which is a bit shorter ;) – Jan Zeiseweis Oct 19 '17 at 15:24
  • @Jan Zeiseweis thanks, solution was repaired. And your solution for joining is shorter, but slowier. Thank you for comment – jezrael Oct 19 '17 at 16:13
0

Try this:

df['hits'] = df.hits.transform(lambda x: range(1, x))
s = df.apply(lambda x: pd.Series(x['hits']),axis=1).stack().reset_index(level=1, drop=True)
s.name = 'hits'
df = pd.concat([df.City, s], axis=1)
df['hit_id'] = df.City + '_' + df.hits.astype(str)
  • Change the df.hits from a column of integers to a column of lists, ranging from 1 to the original value.
  • Use the technique discussed here by @RomanPekar to expand the column of lists into a series of rows.
  • Concatenate that new column with the original df.City column.
  • Join the two columns together with an _.
ASGM
  • 11,051
  • 1
  • 32
  • 53