1

I am very very new to R....I have been using Python and MATLAB my whole life.

So here is what I would like to do. During each loop, I compute a column that I would like to add on to a dataframe.

Problem is that I do not know the length of the column. So I cannot create the dataframe to a specific length. So I keep getting an error when I try to add the column to the empty original empty dataframe...

# extract the data where the column 7 has no data. 
df_glm <- data.frame(matrix(ncol = 11, nrow = 0))
for (j in 1:ncol(data_cancer)){
  col_ele <- data_cancer[,j]
  col_filtered <- col_ele[col_bool7]
  # make new dataframe by concetenating the filtered column.
  df_glm[,i] <- col_filtered
}
data_cancer_filter <- data_cancer[,col_bool7]

How can I resolve this issue?

I am getting an error at df_glm[,i] because the column is as long as col_bool7. But I want to learn how to do this without creating dataframe of exact size beforehand.

Inkyu Kim
  • 97
  • 5
  • 4
    It's practically impossible to provide you help with the question you've written unless you can provide sample data and code. Please review how to create a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Conor Neilson Oct 29 '22 at 00:38
  • If you show a few rows of sample data and explain your goal, we can probably help you do this in a much better way. It's still hard o tell what your goal is. Seems like your last line, `data_cancer_filter <- data_cancer[,col_bool7]`, is a working version of what your loop is trying to do poorly... – Gregor Thomas Oct 29 '22 at 02:18
  • 2
    If you’re trying to `# extract the data where the column 7 has no data`, there are much simpler approaches with no need for a loop. e.g., `dplyr::filter(data_cancer, is.na(col7))`. In general, loops are used much less often in R than other languages — it’s usually easier and more efficient to operate on an entire column or dataframe in one go using vectorized functions and operations. – zephryl Oct 29 '22 at 03:15

1 Answers1

2

If I am understanding this correctly, you're looping through columns and taking the rows where col_bool7 is TRUE and putting it in another dataframe. dplyr filter() would be an efficient solution:

library(dplyr)

df_glm = data_cancer %>%
    filter(col_bool7) 
SpikyClip
  • 154
  • 10