0

I have data in below format:

<table><tbody><tr><th>T_id</th><th>Task_name</th><th>product_id</th></tr><tr><td>1</td><td>A1</td><td>10120</td></tr><tr><td>1</td><td>A2</td><td>10122</td></tr><tr><td>2</td><td>B1</td><td>10111</td></tr><tr><td>2</td><td>B1</td><td>10120</td></tr><tr><td>2</td><td>A1</td><td>10122</td></tr></tbody></table>

I'm able to convert the above format manually using excel in below format

<table><tbody><tr><th>Task_id</th><th> </th><th> </th><th> </th></tr><tr><td>1</td><td>10120</td><td>10122 </td><td> </td></tr><tr><td>2</td><td>10111</td><td>10120</td><td>10122</td></tr></tbody></table>

Can somebody please help me with Python or R program. I have used pivot(), pivot_table() in python & cast, reshape in R. But i couldn't get the required output.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Naseer
  • 127
  • 10
  • "I have used pivot(), pivot_table() in python & cast, reshape in R. But i couldn't get the required output." - if you have used these, can you show us what you've tried and where you are getting errors? – SymbolixAU Sep 19 '19 at 01:23
  • Have a look at https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format – Ronak Shah Sep 19 '19 at 01:24

2 Answers2

1

Is this what you're looking for in R?

library(tidyverse)

df %>% group_by(T_id) %>% summarise(product_id = paste(product_id, collapse =  " ") )

#   T_id product_id                    
#1     1 10120 10122       
#2     2 10111 10120 10122

Matt
  • 2,947
  • 1
  • 9
  • 21
1

Use groupby and the .apply(list).

Solution

Make Data

html = """<table><tbody><tr><th>T_id</th><th>Task_name</th><th>product_id</th></tr><tr><td>1</td><td>A1</td><td>10120</td></tr><tr><td>1</td><td>A2</td><td>10122</td></tr><tr><td>2</td><td>B1</td><td>10111</td></tr><tr><td>2</td><td>B1</td><td>10120</td></tr><tr><td>2</td><td>A1</td><td>10122</td></tr></tbody></table>"""

Make Dataframe and Apply groupby

import pandas as pd

df = pd.read_html(html)[0]
df.groupby('T_id')['product_id'].apply(list)

Output:

T_id
1           [10120, 10122]
2    [10111, 10120, 10122]
Name: product_id, dtype: object

If you want to change the index,

df.groupby('T_id')['product_id'].apply(list).reset_index(name='Product_IDs')

Output:


T_id    Product_IDs
0   1   [10120, 10122]
1   2   [10111, 10120, 10122]

Optional Stuff

If you are loading an html either directly from a website or as a string, please refer to this article.

References

  1. Loading html into a dataframe
  2. grouping rows in list in pandas groupby
CypherX
  • 7,019
  • 3
  • 25
  • 37