I'm an R beginner and would really appreciate your help with a piece of code I'm struggling with...
I have been working with a data set for a while now and after finishing a large chunk of new code I wanted to re-run the script. It all seemed to work fine until I noticed that R no longer recognised the variable names of the data sets I imported (even though none of the code changed and it used to work absolutely fine!).
Here is an overview of the data set I'm using, I imported it from an Excel file:
glimpse(ELFS2)
Rows: 227,727
Columns: 18
Groups: ID [5,208]
$ Cohort <chr> "Study 2 - Condition 0", "Study 2 - Condition 0", "Study 2 - Condition 0", "Study …
$ ID <chr> "ID0103", "ID0103", "ID0103", "ID0103", "ID0103", "ID0103", "ID0103", "ID0103", "I…
$ Action <chr> "AddToTrolley", "AddToTrolley", "AddToTrolley", "AddToTrolley", "AddToTrolley", "A…
$ Quantity <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, NA, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ Product <chr> "Strawberries 300G", "Organic British Semi Skimmed Milk 1.136L, 2 Pint", "Tilda Ba…
$ Price <dbl> 2.50, 0.89, 2.00, 3.30, 4.00, 0.70, 0.70, 0.85, 2.50, 1.60, 1.90, 1.00, 20.54, 2.5…
$ EnergyKCAL <dbl> 125.52, 209.20, 1491.00, 1111.00, 2558.00, 2400.00, 2400.00, 2140.00, 1075.00, 654…
$ EnergyKJ <dbl> 125.52, 209.20, 1491.00, 1111.00, 2558.00, 2400.00, 2400.00, 2140.00, 1075.00, 654…
$ Fat <dbl> 0.1, 1.8, 0.8, 20.0, 49.3, 36.0, 36.0, 26.1, 13.0, 4.4, 33.7, 7.1, NA, 0.1, 1.8, 0…
$ SaturatedFat <dbl> 0.1, 1.1, 0.2, 3.3, 9.8, 21.0, 21.0, 15.6, 4.5, 1.2, 22.2, 1.7, NA, 0.1, 1.1, 0.2,…
$ Carbohydrates <dbl> 6.0, 4.8, 77.7, 0.5, 20.5, 53.0, 53.0, 63.4, 25.0, 21.9, 2.9, 83.0, NA, 6.0, 4.8, …
$ Sugar <dbl> 6.0, 4.8, 0.5, 0.5, 5.4, 49.0, 49.0, 47.5, 2.1, 4.1, 0.4, 5.2, NA, 6.0, 4.8, 0.5, …
$ Fibre <dbl> 1.1, 0.0, 1.0, 0.5, 6.1, 0.0, 0.0, 0.0, 0.0, 1.1, 0.6, 2.5, NA, 1.1, 0.0, 1.0, 0.5…
$ Protein <dbl> 0.8, 3.6, 7.8, 21.5, 19.8, 8.2, 8.2, 6.6, 10.0, 6.5, 21.9, 6.5, NA, 0.8, 3.6, 7.8,…
$ Salt <dbl> 0.01, 0.10, 0.03, 0.33, 0.86, 0.19, 0.19, 0.59, 0.98, 0.49, 1.40, 2.00, NA, 0.01, …
$ ProductWeight <chr> "300g", "1136ml", "500g", "240g", "350g", "30g", "30g", "43g", "335g", "400g", "15…
$ Approval <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0…
$ Approval1 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
>
I've noticed that whilst entering a variable in the code, R will still suggest the variable as usual in a drop-down menu. However, when I select the variable from the suggestions, R enters it with "", as if it were a character:
ELFS2[, "Approval"]
For the following piece of code it doesn't return any error, but it doesn't perform the task. The code used to create a new variable called Approval1 which would have a '1' whenever there was a '1' in the variable Approval in any of the rows for each participant. Now, it creates the new variable Approval1, but this variable contains only NAs:
ELFS2 <- ELFS2 %>%
group_by(ID) %>%
mutate(Approval1 = ifelse(sum(Approval)>0, 1, 0))
The following code should remove all rows for which the variable 'Fat' is unequal 1. However, when I run the code it returns an error message, telling me that the variable is not found at all:
ELFS2.1 <- ELFS2[Fat == 1]
Error in `[.tbl_df`(ELFS2, Fat == 1) : object 'Fat' not found
I thought the variables might not be correctly classified but it all seems correct to me?
The problem relates to all variables as far as I can see. Can anyone make sense of this? I would really appreciate some help! Many many thanks in advance!