-2

enter image description hereI have a pandas data frame and I would like to duplicate those rows which meet some column condition (i.e. having multiple elements in CourseID column)

I tried iterating over the data frame to identify the rows which should be duplicated but i don't know how to duplicate them,

Here is the link to the expected output

Zebra
  • 139
  • 2
  • 10
  • 1
    Please post the data directly into your question rather than using images or external links – Chris Sep 26 '19 at 23:41
  • 2
    There are multiple questions on this topic, also look at df.explode: https://pandas.pydata.org/pandas-docs/version/0.25/reference/api/pandas.DataFrame.explode.html#pandas-dataframe-explode – Vaishali Sep 26 '19 at 23:43
  • Possible duplicate of [Unnest (explode) a Pandas Series](https://stackoverflow.com/questions/48197234/unnest-explode-a-pandas-series) – henrywongkk Sep 27 '19 at 02:40

1 Answers1

0

Using Pandas version 0.25 it is quite easy:

The first step is to split df.CourseID (converting each element to a list) and then to explode it (break each list into multiple rows, repeating other columns in each row):

course = df.CourseID.str.split(',').explode()

The result is:

0    456
1    456
1    799
2    789
Name: CourseID, dtype: object

Then, all to do is to join df with course, but in order to avoid repeating column names, you have to drop original CourseID column before. Fortunately, in can be expressed in a single instruction:

df.drop(columns=['CourseID']).join(course)

If you have some older version of Pandas this is a good reason to upgrade it.

Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41