0

I have the following dataframe

df_in = pd.DataFrame({
'State':['C','B','D','A','C','B'],
'Contact':['alpha a. theta| beta','beta| alpha a. theta| delta','Theta','gamma| delta','alpha|Eta| gamma| delta','beta'],
'Timestamp':[911583000000,912020000000,912449000000,912742000000,913863000000,915644000000]})

How do I transform it so that the second column which has pipe separated data is broken out into different rows as follows:

df_out = pd.DataFrame({
'State':['C','C','B','B','B','D','A','A','C','C','C','C','B'],
'Contact':['alpha a. theta','beta','beta','alpha a. theta','delta','Theta','gamma', 'delta','alpha','Eta','gamma','delta','beta'],
'Timestamp':[911583000000,911583000000,912020000000,912020000000,912020000000,912449000000,912742000000,912742000000,913863000000,913863000000,913863000000,913863000000,915644000000]})

print(df_in)
print(df_out)

I can use pd.melt but for that I already need to have the 'Contact' column broken out into multiple columns and not have all the contacts in one column separated by a delimiter.

Alhpa Delta
  • 3,385
  • 4
  • 16
  • 31

1 Answers1

1

You could split the column, then merge on the index:

  df_in.Contact.str.split('|',expand=True).stack().reset_index()\
    .merge(df_in.reset_index(),left_on ='level_0',right_on='index')\
    .drop(['level_0','level_1','index','Contact'],1)
Out: 
                  0 State     Timestamp
0    alpha a. theta     C  911583000000
1              beta     C  911583000000
2              beta     B  912020000000
3    alpha a. theta     B  912020000000
4             delta     B  912020000000
5             Theta     D  912449000000
6             gamma     A  912742000000
7             delta     A  912742000000
8             alpha     C  913863000000
9               Eta     C  913863000000
10            gamma     C  913863000000
11            delta     C  913863000000
12             beta     B  915644000000
Onyambu
  • 67,392
  • 3
  • 24
  • 53
  • there is a slight problem with this solution. The column "Contact" will split properly only if there are no other spacings. But the names are not like that there. The are like "alpha R. beta". so like real names. The extractall('(\\w+)') part breaks it out into individual words it seems. How do we modify this so that it just separates on the basis of the "|" separator? – Alhpa Delta Sep 13 '19 at 17:33
  • I have modified the input and out by changing the names " "alpha" to alpha a. theta" to make it clearer. I am trying to use str.split('|'). But that does not quite do it. – Alhpa Delta Sep 13 '19 at 17:47
  • @AlhpaDelta check the edit – Onyambu Sep 13 '19 at 18:03
  • Thanks! that looks perfect!! – Alhpa Delta Sep 13 '19 at 19:47