1

I'm very new to R. I have a dataframe with 100 fields each consisting of 65 species of plants (6500 rows). I want to calculate a value for each of the 100 fields, which should be:

value (field_1) = (plant_cover1 * plant_trait1 + plant_cover2 * plant_trait2 + ......)/(plant_cover1 + plant_cover2 + .....)

Plant_cover1: Vertical_density value for species 1 Plant_trait1: slamean value for species 1

I've tried the following, but I'm stuck. Also i get the error "the condition has length > 1 and only the first element will be used".

for(i in levels(NPT$Feltnummer)) {
        for (i in levels(NPT$Artsnavn_dansk)) {
                if(NPT$Vertikal_densitet>0 & NPT$slamean>0) {
                       return((NPT$vertikal_densitet*NPT$slamean)/NPT$Vertikal_densitet)
                        }sum()}Return()}

How would I go about calculating the 100 values? I hope you can help.

Here's some of my data (2 fields):

 dput(head(NPT, 130))

structure(list(Feltnummer = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), Vertikal_densitet = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.64, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 6.2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.24, 
0.48, 0, 0, 0, 0, 0.36, 0.44, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0.64, 0.64, 0, 0, 0, 0, 0.04, 0.84, 0, 0, 0, 0.32, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), slamean = c(54.015, 
0, 29.3766666666667, 0, 0, 0, 29.5933333333333, 20.63, 0, 0, 
36.33, 19.8166666666667, 0, 0, 16.4233333333333, 5.95, 9.35, 
27, 12.82, 39.27, 31.6425, 15.3433333333333, 0, 20.4775, 11.37, 
22.8, 28.185, 0, 0, 12.41, 4.92, 18.99, 41.47, 32.05, 0, 19.1875, 
0, 7.61, 0, 0, 0, 15.0425, 0, 15.586, 0, 0, 8.425, 34.0825, 0, 
13.71, 13.55, 0, 24.87, 0, 17.97, 13.96, 18.85, 0, 0, 29.13, 
12.87, 10.11, 30.11, 0, 0, 54.015, 0, 29.3766666666667, 0, 0, 
0, 29.5933333333333, 20.63, 0, 0, 36.33, 19.8166666666667, 0, 
0, 16.4233333333333, 5.95, 9.35, 27, 12.82, 39.27, 31.6425, 15.3433333333333, 
0, 20.4775, 11.37, 22.8, 28.185, 0, 0, 12.41, 4.92, 18.99, 41.47, 
32.05, 0, 19.1875, 0, 7.61, 0, 0, 0, 15.0425, 0, 15.586, 0, 0, 
8.425, 34.0825, 0, 13.71, 13.55, 0, 24.87, 0, 17.97, 13.96, 18.85, 
0, 0, 29.13, 12.87, 10.11, 30.11, 0, 0)), row.names = c(NA, -130L
), class = c("tbl_df", "tbl", "data.frame"))

  

  str(NPT)

tibble [6,500 x 3] (S3: tbl_df/tbl/data.frame)
 $ Feltnummer       : num [1:6500] 1 1 1 1 1 1 1 1 1 1 ...
 $ Vertikal_densitet: num [1:6500] 0 0 0 0 0 0 0 0 0 0 ...
 $ slamean          : num [1:6500] 54 0 29.4 0 0 ...
  • Can you share reproducible example of your dataframe using dput() – Karthik S Oct 11 '20 at 13:23
  • When i use dput(NPT) i just get a very long output of 0's 1's and NA's, and it is too long to put here. Is that what you meant? When I use str(NPT) i get the following: > str(NPT) ... $ Artsnavn_dansk : Factor w/ 65 levels "aflangbladet vandaks",..: 1 1 ... $ Feltnummer : Factor w/ 100 levels "1","2","3","4",..: 1 2 3 ... $ Vertikal_densitet : num [1:6500] 0 .. ::: – Anne Sofie Hasselgaard Skaanni Oct 11 '20 at 13:43
  • It's in danish, so $Artsnavn_dansk is species name, $Feltnummer is field number – Anne Sofie Hasselgaard Skaanni Oct 11 '20 at 13:49
  • It's still way too long to put here. Here's the top of the output: > dput(head(NPT)) structure(list(Artsnavn_dansk = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("aflangbladet vandaks", "almindelig engelsød", "almindelig hvene", "almindelig hønsetarm", "almindelig kohvede", "almindelig kongepen", "almindelig kællingetand", "almindelig s – Anne Sofie Hasselgaard Skaanni Oct 11 '20 at 13:52
  • Of course. Thank. I just edited the question. – Anne Sofie Hasselgaard Skaanni Oct 11 '20 at 13:59
  • Please reduce the data size. Use these guidelines to share a shorter code. https://stackoverflow.com/help/minimal-reproducible-example – Lazarus Thurston Oct 11 '20 at 17:06
  • Also please note, `for/while loops` in R are seldom used. If you are using to loop through `data.frame` rows you are certainly doing it wrong. Please use a reprex for getting quicker responses. – Lazarus Thurston Oct 11 '20 at 17:08

1 Answers1

0

Thanks for posting the data. It looks like you have a lot of columns that are not used in your calculation. We like to work with a "minimum reproducible example":

How to make a great R reproducible example

Thank you for posting the reduced data set.

With R, you often find you do a lot without loops. In this case, there are a couple different approaches that can work in that manner.

You can try using tidyverse (in this case, just needing dplyr package).

Starting with NPT data, you would group_by the field, then filter out rows where both Vertikal_densitet and slamean are zero (if that is what is intended).

library(dplyr)

NPT %>%
  group_by(Feltnummer) %>%
  filter(Vertikal_densitet > 0 & slamean > 0) %>%
  summarise(value = sum(Vertikal_densitet * slamean) / sum(Vertikal_densitet))

Given your new example data, I get:

  Feltnummer value
       <dbl> <dbl>
1          1 11.6 
2          2  9.84
Ben
  • 28,684
  • 5
  • 23
  • 45
  • This looks completely right! I used dplyr and tried to run it, but my output looks like this: value 1 NaN I don't have values for all species, so i might have to manage the missing values? – Anne Sofie Hasselgaard Skaanni Oct 11 '20 at 21:37
  • When i try the first example, it works but i get a different output: value 1 56 When i try the second one i get this error: > NPT_N <- aggregate(cover_trait ~ Feltnummer, data = NPT2, sum) Error in aggregate.data.frame(mf[1L], mf[-1L], FUN = FUN, ...) : no rows to aggregate Concerning the cover - is it not inherrent in the cover > 0 that the sum of cover should also be > 0? I tried your code on a dataframe with only the columns necessary, but it returns the same error: value 1 NaN – Anne Sofie Hasselgaard Skaanni Oct 15 '20 at 12:15
  • Also - when trying the first example NPT2 has 0 obs. of 5 variables. This seems wrong. But i have no idea why this happens. – Anne Sofie Hasselgaard Skaanni Oct 15 '20 at 12:19
  • 1
    I think i had a problem with loading multiple packages. Now i get the same output as you: 52.7 and 57.2. Happy days. I'm going to try fiddling with the data and report back later. Thank you so much for the help so far! – Anne Sofie Hasselgaard Skaanni Oct 15 '20 at 12:52
  • Now it works perfectly for your data example, but when i try to use my dataframe i get this: `summarise()` ungrouping output (override with `.groups` argument) # A tibble: 0 x 2 # ... with 2 variables: Feltnummer , value It seems to me that i get two columns "feltnummer" and "value", but no rows of actual values. Is it bacause i use "read_excel" to import my dataframe? – Anne Sofie Hasselgaard Skaanni Oct 19 '20 at 10:19
  • I also notice that when i import my dataframe my data is numerical as opposed to the data example that uses integers. I tried converting my data to integers with as.integer, but i get the same output. Could it have something to do with my data being decimal numbers maybe? – Anne Sofie Hasselgaard Skaanni Oct 19 '20 at 10:31
  • I've created a new excel sheet with only 3 columns (Feltnummer, Vertikal_densitet and slamean), and they're all numeric when i import them with read_excel. I've edited the question to include > dput(head(NPT, 130)) (should give 2 values as output) and > str(NPT). – Anne Sofie Hasselgaard Skaanni Oct 19 '20 at 19:51
  • Oh, and there is no NA's in the data > apply(NPT, 2, function(x) any(is.na(x))) Feltnummer Vertikal_densitet slamean FALSE FALSE FALSE – Anne Sofie Hasselgaard Skaanni Oct 19 '20 at 20:04
  • This works perfectly! Thank you so much for all of your help and patience. I really appreciate it a lot. – Anne Sofie Hasselgaard Skaanni Oct 20 '20 at 11:01
  • I have a small follow up question: The original data has a column for grazing (0/1) for each field/"Feltnummer". Would it be possible to attach a column to the output of this code with grazing? Right now i'm just attching it afterward, but it doesn't seem like a very good solution and it doesn't always work. Also - would it be possible to get NA values for the fields that doesn't have any species that meet the criteria (slamean>0 and Vertikal_desitet>0)? That way i would get 100 values everytime i run it for the 100 fields – Anne Sofie Hasselgaard Skaanni Oct 26 '20 at 17:51
  • For first question - just try `group_by(Feltnummer, Grazing)` and you will get grazing your summarized output. This assumes that each field has only one single common "grazing" value. – Ben Oct 26 '20 at 19:15
  • As for `NA` I'm not 100% sure I know of expectations here. I would try removing the `filter` line in your pipe. Instead change the final line to: `summarise(value = sum(Vertikal_densitet * slamean) / sum(Vertikal_densitet[slamean > 0]))` ... if either `Vertikal_densitet` or `slamean` is zero, the numerator will work the same way...just the denominator will change to only add up `Vertikal_densitet` where the `slamean` is greater than zero...at least, that's what it sounds like... – Ben Oct 26 '20 at 19:24
  • For the first question: That worked beautifully! Thank you! For the second question: This changes the values and I now get values for all fields, but the values was right before. I'm trying to get the same values as before, and instead of getting no value for the fields that doesn't meet the requirements, I would like to get NA. That way I would get the same values as before, and still have 50 fields with grazing and 50 fields without. Do you think that would be possible? Or should I just do my statistics without equal number of fields? – Anne Sofie Hasselgaard Skaanni Oct 27 '20 at 11:54
  • I’m sorry, this isn’t clear to me what you have in mind. I would make a new question and make a simple reproducible example - just 4 or so columns of a data frame - that shows what you want. There’s likely a simpler solution to all of this, but it would help to have a clear example to work with. – Ben Oct 27 '20 at 12:02
  • After some thought, I don't think it will be necessary with the NA's anyway. It shouldn't be a problem comparing two samples of diffenrent sample size. I'm sorry for the inconvenience. – Anne Sofie Hasselgaard Skaanni Oct 27 '20 at 16:31