1

thanks for the help in advance. I'm relatively new to R and I'm still learning how to properly use it for data analysis. I currently have my data set up like so:

A   B   C   D   E   NEVER
NOT APPLICABLE  NOT APPLICABLE  NOT APPLICABLE  NOT APPLICABLE  NOT APPLICABLE  yes
NOT APPLICABLE  NOT APPLICABLE  NOT APPLICABLE  NOT APPLICABLE  NOT APPLICABLE  yes
NOT APPLICABLE  NOT APPLICABLE  NOT APPLICABLE  NOT APPLICABLE  NOT APPLICABLE  yes
NA  NA  NA  NA  NA  NA
NA  NA  NA  NA  NA  NA
NA  NA  NA  NA  NA  NA
NA  NA  NA  NA  NA  NA
NOT APPLICABLE  NOT APPLICABLE  NOT APPLICABLE  NOT APPLICABLE  NOT APPLICABLE  yes
NA  NA  NA  NA  NA  NA
NA  NA  NA  NA  NA  NA
yes yes yes yes NA  NA
NA  NA  NA  NA  NA  NA
NA  no  no  NA  NA  NA
yes no  no  no  no  NA
yes NA  NA  NA  NA  NA
yes yes yes yes yes NA
NA  NA  NA  NA  NA  NA

I'm trying to find a way to go through multiple columns to change them so that R will look through Row Y, Column A and see if there is a "YES", then if there is a YES to make a new column and input "YES" as the value for Row Y

This would one situation. In another situation, R would go to Row Y and then do the same function, but if it can't find a YES, it will then look for a "NO" and if there is a "NO" it will input "NO" into the newly created column

Finally, the last possibility would be that there is neither a "YES" nor a "NO" in which case I want R to put in "NA" into the newly created column

I would want this command to be executed to Row Y+1300 (the end of the dataset)

IMPORTANT: What also threw me off is that the last column makes it so that it asks the opposite question essentially, so I would want the "YES"s and "NO"s to be flipped to be the opposite (YES becomes NO, vice versa) prior to the R loop command

EDIT: I originally was going to use a for-loop but there are too many different combinations, so I was hoping to find a more effective way to streamline the commands

EDIT: each row represents a different participant so I want to see whether or not they answered "yes" for any of the columns A-E and "no" for NEVER AND if that's not the case then to see if they answered "no" for any of column A-E and "yes" for NEVER AND if that's not the case then they must have "NA" for all those 6 columns

j681
  • 21
  • 2
  • 2
    When you are reading, just specify the `na.strings` i.e. `read.csv('yourfile.csv', na.strings = "NOT APPLICABLE")` – akrun Jul 24 '17 at 02:48

2 Answers2

0

We can do this by specifying na.strings in the read.csv/read.table

df1 <- read.csv('yourfile.csv', na.strings = "NOT APPLICABLE")
akrun
  • 874,273
  • 37
  • 540
  • 662
0

Here's one solution to your problem: R is different than other languages, in that you don't always need loops to go through each element of your data, as it has the built-in "apply" functions (some awesome threads R Grouping functions: sapply vs. lapply vs. apply. vs. tapply vs. by vs. aggregate on this forum explain them very well).

So, here's the code to recode "yes" to 1, "no" to 2, and "NOT APPLICABLE" to NA, and end up with numeric data.

df <- data.frame(c("NOT APPLICABLE", "yes", "NOT APPLICABLE"),
                 c("NA", "NA", "NA"), 
                 c("yes", "yes", "yes"),
                 c("no", "NOT APPLICABLE", "yes"), stringsAsFactors = F)
# notice the stringAsFactors=F to make sure you get strings, not factors
colnames(df) <- c("A", "B", "C", "NEVER")
str(df)
df

# define the recode function
recode <- function(x)
  { x[x == "yes"] <- 1
    x[x == "no"] <- 2
    x[x=="NOT APPLICABLE"] <- NA
    x[x=="NA"] <- NA
    x <- as.numeric(x)}

# apply the function to desired data
data <- as.data.frame(lapply(df, recode))
data

Having numeric data solves your question of recoding the last question ("reverse coding"). For that, you just need to do:

new_variable <- max_value + 1 - old_variable 

For more details on reverse coding, see here http://www.theanalysisfactor.com/easy-reverse-code/ or just google reverse coding.

Now, for the last part where you want to create a new column, I don't really understand your data structure. Do you have multiple respondents and multiple questions? Are questions nested within respondents? If you are more clear, I could help.

EDIT: For the second part, try this:

# I modified the data slightly to make it more like yours:
df <- data.frame(c("NOT APPLICABLE", "NA", "yes", "yes"),
             c("NOT APPLICABLE", "NA", "yes", "no"), 
             c("NOT APPLICABLE", "NA", "NA", "no"),
             c("yes", "NA", "NA", "NA"), stringsAsFactors = F)
# notice the stringAsFactors=F to make sure you get strings, not factors
colnames(df) <- c("A", "B", "C", "NEVER")
str(df)
df

> data
   A  B  C NEVER
 1 NA NA NA     1
 2 NA NA NA    NA
 3  1  1 NA    NA
 4  1  2  2    NA

# counts the occurences of values by row
> data$yes <- rowSums(data == 1, na.rm = T)
> data$no <-  rowSums(data == 2, na.rm = T)

> data
   A  B  C NEVER yes no 
1 NA NA NA     1   1  0  
2 NA NA NA    NA   0  0  
3  1  1 NA    NA   2  1  
4  1  2  2    NA   1  2  

# this last part creates the new column
data$new[data$yes==0 & data$no == 0] <- "NA"
data$new[data$yes!=0] <- "yes"
data$new[data$no!=0] <- "no"

> data
   A  B  C NEVER yes no new
1 NA NA NA     1   1  0 yes
2 NA NA NA    NA   0  0  NA
3  1  1 NA    NA   2  1  no
4  1  2  2    NA   1  2  no
> data[ , -c(5:6)] # use this to remove columns you don't need
    A  B  C NEVER new
1 NA NA NA     1 yes
2 NA NA NA    NA  NA
3  1  1 NA    NA  no
4  1  2  2    NA  no
cremorna
  • 374
  • 1
  • 9
  • Hi, thank you for the help! Yeah, each row represents a different respondent – j681 Jul 24 '17 at 13:55
  • And do you have different questions/variables? Is the structure of data pers1 - var1, pers1 - var2, pers1 - var3 or is it var1 - pers1, var1 - pers2 and var1 - pers3? – cremorna Jul 24 '17 at 15:30
  • Yup, each column is its own variable and the row # corresponds to different respondents – j681 Jul 24 '17 at 17:36
  • If we're going across row 1... row1 col1 = variable 1 of participant 1, r1c2 = var2 of part1, r1c3 = var3 of part1 – j681 Jul 24 '17 at 17:37