1

I have a Pandas DataFrame on which I need to replicate some of the rows based on the presence of a given list of values in certain columns. If a row contains one of these values in the specified columns, then I need to replicate that row.

df = pd.DataFrame({"User": [1, 2], "col_01": ["C", "A"], "col_02": ["A", "C"], "col_03": ["B", "B"], "Block": ["01", "03"]})

    User col_01 col_02 col_03 Block
0     1      C      A      B    01
1     2      A      C      B    03

values = ["C", "D"]
columns = ["col_01", "col_02", "col_03"]
rep_times = 3

Given these two lists of values and columns, each row that contains either 'C' or 'D' in the columns named 'col_01', 'col_02' or 'col_03' has to be repeated rep_times times, therefore the output table has to be like this:

    User col_01 col_02 col_03 Block
0     1      C      A      B    01
1     1      C      A      B    01
2     1      C      A      B    01
3     2      A      A      B    03

I tried something like the following but it doesn't work, I don't know how to create this final table. The preferred way would be a one-line operation that does the work.

df2 = pd.DataFrame((pd.concat([row] * rep_times, axis=0, ignore_index=True)
if any(x in values for x in list(row[columns])) else row for index, row in df.iterrows()), columns=df.columns)
Fsanna
  • 347
  • 1
  • 4
  • 11
  • 1
    Does this answer your question? [Replicating rows in a pandas data frame by a column value](https://stackoverflow.com/questions/26777832/replicating-rows-in-a-pandas-data-frame-by-a-column-value) – ThePyGuy May 16 '21 at 12:00
  • It partially answers it but it doesn't help me on using a condition to make the rows replication because that answer replicates all the rows – Fsanna May 16 '21 at 12:33

1 Answers1

3
import pandas as pd

Firstly create a boolean mask to check your condition by using isin() method:

mask=df[columns].isin(values).any(1)

Finally use reindex() method ,repeat those rows rep_times and append() method to append rows back to dataframe that aren't satisfying the condition:

df=df.reindex(df[mask].index.repeat(rep_times)).append(df[~mask])
Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41