1

I am trying to organise a dataset in a very specific way for my research, however I am new to R and I am really struggling, any assistance would be greatly appreciated.

I am attempting to take the value of the cell at every third column (starting from the first one) and multiply it by the column beside it, but only if there is a negative value in said cell. Following this, I would like to sum the results together and store it in a new column in an external spreadsheet.

so far the code I have written is as follows:

 NegTotal = NULL 
    p = NULL
    for (i in 1:nrow(Datafile)) 
      {for (j in 1:ncol(Datafile)) 
           {if ((j %% 3 == 0) && (Datafile [i,j] < 0)) {
               p <- (datafile[i,j] * datafile[i,j+1])
               NegTotal <- sum(p) }
           else { }
          }
       } 
   for (l in seq(along = NegTotal)) {
      dim(newColumn)
      AsNewData.DataColumn("datafile", GetType(System.String))
      NewColumn.DefaultValue = "NegTotal"
      table.Columns.Add(newColumn)
    }

I am aware that this code is probably completely wrong, this is the first time I've used R and I am not very proficient at computer programming in general.

The current data is arranged as follows:

df <- data.frame(F1 = c( 1, -2, -1), E1 = c(1, 1, 0), Y1 = c(0, 0, 1),
                 F2 = c(-1,  2, -1), E2 = c(1, 1, 1), Y2 = c(0, 0, 1), 
                 F3 = c(-2, -2, -1), E3 = c(1, 1, 1), Y3 = c(1, 1, 0))

#   F1 E1 Y1 F2 E2 Y2 F3 E3 Y3
# 1  1  1  0 -1  1  0 -2  1  1
# 2 -2  1  0  2  1  0 -2  1  1
# 3 -1  0  1 -1  1  1 -1  1  0

Desired Output:

#   F1 E1 Y1 F2 E2 Y2 F3 E3 Y3 NegTotal
# 1  1  1  0 -1  1  0 -2  1  1       -3
# 2 -2  1  0  2  1  0 -2  1  1       -4
# 3 -1  0  1 -1  1  1 -1  1  0       -2

So if x = Fy * Ey; NegTotal = x1 + x2 + x3, only when F$y < 0.

I hope that all makes sense!

JasonAizkalns
  • 20,243
  • 8
  • 57
  • 116
CrmC
  • 11
  • 2
  • Can you post a sample of your data and display the desired output? Also check out: [How to make a great reproducible example](http://stackoverflow.com/q/5963269/2572423) – JasonAizkalns Nov 09 '15 at 20:18
  • @JasonAizkalns Please observe the edited post. – CrmC Nov 10 '15 at 09:07

2 Answers2

1

Here's how I would approach this with dplyr and tidyr:

library(dplyr)
library(tidyr)

# Add a respondent column (i.e. row number)
df$respondent <- 1:nrow(df)

df %>%
  gather(key, value, -respondent) %>%
  separate(key, c("letter", "letter_sub"), sep = 1) %>%
  spread(letter, value) %>% 
  mutate(Neg = ifelse(F < 0, E * F, NA)) %>%
  group_by(respondent) %>%
  summarise(NegTotal = sum(Neg, na.rm = TRUE))

# Source: local data frame [3 x 2]
# 
#   respondent NegTotal
#        (int)    (dbl)
# 1          1       -3
# 2          2       -4
# 3          3       -2

To understand what's going on, I would run the pipeline in pieces. For example, look at the results of the first few functions:

df %>%
  gather(key, value, -respondent) %>%
  separate(key, c("letter", "letter_sub"), sep = 1) %>%
  spread(letter, value)

#   respondent letter_sub E  F Y
# 1          1          1 1  1 0
# 2          1          2 1 -1 0
# 3          1          3 1 -2 1
# 4          2          1 1 -2 0
# 5          2          2 1  2 0
# 6          2          3 1 -2 1
# 7          3          1 0 -1 1
# 8          3          2 1 -1 1
# 9          3          3 1 -1 0

Getting the data in this form, makes it easier to perform the summary tasks.

JasonAizkalns
  • 20,243
  • 8
  • 57
  • 116
0

This code will give you your desired output. However, if your actual dataset is more complex than the example you gave, you may need a more elegant solution.

df$NegTotal<- (pmin(0,df$F1) * df$E1) + (pmin(0,df$F2) * df$E2) + (pmin(0,df$F3) * df$E3)
Pash101
  • 631
  • 3
  • 14
  • Unfortunately the dataset is large (2147 x 117) so this won't work. Thanks though! @Pash101 – CrmC Nov 10 '15 at 14:33
  • No worries, do the 'Fx', 'Ex', 'Yx' style names exist within your actual set? – Pash101 Nov 10 '15 at 14:55
  • Yes, unfortunately, I'm aware it makes things more complicated. What I'm trying to do is organise data from a questionnaire, so "F" "E" and "Y" represent different types of data for each question the participants were asked. The numbers following represent the question number. @Pash101 – CrmC Nov 10 '15 at 15:04