1

I am trying to convert the following block of codes written in R to Python:

 df <- df %>% 
  group_by("column_1") %>%
  mutate(new_col1 = length(which(column_x < 1)),
  new_col2 = new_col1 /counter)
df: is a dataframe

My attempt to do this in Python is the following blocks:

df = df.groupby(['column_1']).apply(
new_col1=len(df[df['column_x']] < 1)),
new_col2= df['new_col1'] / num_samples)

But I am getting the following error:

 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   

Note that column new_col2 needs new_col1 to be created and so I couldn't find a way to combine the operation of creating two columns with custom behavior and group them by a single column from the data frame.

How would I able to convert the above R block of codes into a working python code using pandas?

Thanks a lot in advance,

aBiologist
  • 2,007
  • 2
  • 14
  • 21
  • 1
    kindly add sample data with expected output – sammywemmy Aug 26 '20 at 00:40
  • Please see [How to provide a reproducible copy of your DataFrame using `df.head(30).to_clipboard()`](https://stackoverflow.com/questions/52413246), then **[edit] your question**, and paste the clipboard into a code block. Always provide a [mre] **with code, data, errors, current output, and expected output, as text**. Only plot images are okay. – Trenton McKinney Aug 26 '20 at 00:41

1 Answers1

2

So we do transform

df['new_col1'] = (df['column_x'] < 1).groupby(df['column_1']).transform('sum')
df['new_col2'] = df['new_col1']/num_samples

dplyr::mutate here is equal to transform, but transform only accept one column calculation

BENY
  • 317,841
  • 20
  • 164
  • 234