0

I wish to have some advice on this problem in R. I have a data frame "my_fruits_data" with many columns including the index columns as below in name_cols. I want to filter those index columns one by one with a for loop and store the filtered records in respective data frames with their names listed in df_fruits for post-processing. Apparently, it doesn't work as df_fruits elements are strings rather than actual data frame names. I've searched and got a few hints but none of them actually helped.

# column names
name_cols <- c("Index_apple",  
             "Index_pear",
             "Index_orange",  
             "Index_watermelon",
             "Index_strawberry"
         )
# dataframe names for filtered result 
df_fruits <- c("df_apple",  
             "df_pear",
             "df_orange",  
             "df_watermelon",
             "df_strawberry")

for (i in name_cols) 
{  
    df_fruits[i] <- my_fruits_data %>% 
           filter (.data[[name_cols[i]]] ==1) 
    ......
}

Thanks chase77

  • 4
    It helps to have usable data for questions, making it a complete "minimal working example"; please include sample data (reprex) that we can use, preferably with `dput(x)`; see https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. Ultimately, I feel a `for` loop is unlikely to be the preferred method for this, can you show what you're intending to have at the end of all of this processing? It's likely R has a more-efficient way to approach what you need. – r2evans Dec 20 '21 at 06:24
  • 3
    This is simply data splitting/ data grouping. You do not need to use for-loops. Give an example of your data and the expected output. Also what do you mean as further processing? IF you are going to do almost similar post process for each fruit dataset, You should rather group the whole dataset than having it in different fruit datasets. – Onyambu Dec 20 '21 at 06:29

1 Answers1

1

I understood that you want to split your data based on the type of fruit, which is provided by separate index columns. Here is how to do that with an example dataset.

library(tidyverse)
my_fruits_data = tribble(
  ~ index_apple, ~ index_pear, ~index_banana, ~ x1,
  1, 0, 0, 10,
  1, 0, 0, 11,
  0, 1, 0, 12,
  0, 0, 1, 13,
  0, 0, 1, 14, 
  0, 0, 1, 15
)

The example data:

> my_fruits_data
# A tibble: 6 x 4
  index_apple index_pear index_banana    x1
        <dbl>      <dbl>        <dbl> <dbl>
1           1          0            0    10
2           1          0            0    11
3           0          1            0    12
4           0          0            1    13
5           0          0            1    14
6           0          0            1    15

First you can transform the data to have a single fruit column that mentions the type of fruit:

fruit_data = my_fruits_data %>% 
  pivot_longer(
    cols = starts_with("index_"), 
    names_prefix = "index_", 
    names_to = "fruit",
    values_to = "fruit_ind"
  ) %>% 
  filter(fruit_ind == 1) %>% 
  select(-fruit_ind)

The result:

> fruit_data
# A tibble: 6 x 2
     x1 fruit 
  <dbl> <chr> 
1    10 apple 
2    11 apple 
3    12 pear  
4    13 banana
5    14 banana
6    15 banana

Finally, as @Onyambu mentioned, you could consider grouping this data by our new variable fruit. If you wanted to do different processing for different fruits, you could split() the data to get a list of separate data frames for each fruit:

> split(fruit_data, fruit_data$fruit)
$apple
# A tibble: 2 x 2
     x1 fruit
  <dbl> <chr>
1    10 apple
2    11 apple

$banana
# A tibble: 3 x 2
     x1 fruit 
  <dbl> <chr> 
1    13 banana
2    14 banana
3    15 banana

$pear
# A tibble: 1 x 2
     x1 fruit
  <dbl> <chr>
1    12 pear 
kybazzi
  • 1,020
  • 2
  • 7
  • Thank you so much Kybazzi for the detailed demo to get around the problem and also to Onyambu and r2evens for the ideas. I'll try - it should work. But this problem prompted me to search for a way to turn a string into a data frame name and only got an idea of using function assign(): – chase77 Dec 20 '21 at 18:33
  • Thank you so much Kybazzi for the detailed demo to get around the problem and also to Onyambu and r2evens for the ideas. I'll try - it should work. But this problem prompted me to search for a way to turn a string into a data frame name and only got an idea of using function assign(): assign(string, df_apple %>% filter(.data[[Index_fruits[1]]] ==1)). But this method doesn't work conveniently for my case. Would like to have some generic ideas for assigning a string to data frame name. – chase77 Dec 20 '21 at 18:39
  • I don't think it's a recommended approach to try using `assign()` in this way - why do you want to do that instead of something similar to the solution I've showed here? – kybazzi Dec 20 '21 at 20:45
  • Because there are following analysis e.g. using summarise(). I don't want to copy the same set of codes multiple times for different fruits (over 50 types in my actual case). That's why I try to use a loop. – chase77 Dec 20 '21 at 21:59
  • In my code, you can summarize results on `fruit_data`, such as `fruit_data %>% group_by(fruit) %>% summarise(x = mean(x1))`. I still don't understand why you want to create a large number of variables using `assign()`. – kybazzi Dec 21 '21 at 02:33
  • It's silliness of me. I'm from a python programing for app background - R is new to me and can't get rid of ideas of loop and controls. Thanks. – chase77 Dec 21 '21 at 19:34