0

I have a three group of variables for example, the following groups contain variables

  1. compassion, relevance, time, examples
  2. work, credit, science
  3. action, response, efficient.

I want that if one of the variables from the first group has value 1, that must count one. and if one of the variables from the second group has value 1, that must count also one. lastly, if one of the variables from the third group has value 1, that must count also one.

i am confused with that code,

if(Compassion > 0 | relevance > 0 | Time > 0 | 
   Exemplification > 0 & credit > 0 | Science > 0 | 
   Work > 0 & Action > 0 | Response > 0 | efficient> 0)
Phil
  • 7,287
  • 3
  • 36
  • 66
Majid Ali
  • 1
  • 3
  • I don't understand what you want in return. From what I understand, if any values is 1, you trigger your if condition ? You do you bother with different groups ? Could you provide a table where you represent some cases ? – Gowachin Mar 26 '21 at 13:05
  • I have Twitter data, the first group is Internalization, the second is Explanation and the third one is Action. if any variable from the first, second, and third group occurred in tweets that must count 1. Actually, I want to know how many tweets have at least one or more variables from each group. if variables occurred from only one or two groups that are also fine. to know that how many tweets have variables from one group? how many tweets have variables from two groups? how many tweets have variables from all three groups? – Majid Ali Mar 26 '21 at 16:53
  • 1
    Could you provide an exemple of this data, I bet it's a data.frame, maybe caling `head(data)` could help us find how to test your code ? Take a look here ;) https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610 – Gowachin Mar 26 '21 at 21:22
  • I have my data in the following link. https://docs.google.com/spreadsheets/d/18TdXdCOA4S8lTzTMVkJIf6v6dg9vFiBNjwXQbI4VJHA/edit?usp=sharing – Majid Ali Mar 26 '21 at 22:03

1 Answers1

0

Hi here is an exemple of code of what you want. Note that you can use it, deform it to retrieve different aspect of your dataset.

# Reproduction of your dataset type (not a copy, sample is a random function). 
# This is the kind of example it is nice to have in your question
df <- data.frame(Compassion = sample(c(1,0), 5, replace = TRUE),
                 relevance = sample(c(1,0), 5, replace = TRUE),
                 Time = sample(c(1,0), 5, replace = TRUE), 
                 Exemplification = sample(c(1,0), 5, replace = TRUE), 
                 credit = sample(c(1,0), 5, replace = TRUE), 
                 Science = sample(c(1,0), 5, replace = TRUE), 
                 Work = sample(c(1,0), 5, replace = TRUE), 
                 Action = sample(c(1,0), 5, replace = TRUE), 
                 Response = sample(c(1,0), 5, replace = TRUE), 
                 efficient = sample(c(1,0), 5, replace = TRUE))

df

# The groups
g1 <- c("Compassion", "relevance", "Time", "Exemplification")
g2 <- c("credit", "Science", "Work")
g3 <- c("Action", "Response", "efficient")

# TRUE/FALSE on each group. As your data is coded in 0/1, a sum by row is efficient.
boolG1 <- rowSums(df[g1]) >= 1
boolG2 <-rowSums(df[g2]) >= 1
boolG3 <-rowSums(df[g3]) >= 1

# extract the rows where the sum is > to 0
df[boolG1 | boolG2 | boolG3,]
# Printing the number of rows, and changing the conditions
sprintf("number of tweet from 3 groups : %d", nrow(df[boolG1 | boolG2 | boolG3,]))
sprintf("number of tweet from 1st group : %d", nrow(df[boolG1,]))
sprintf("number of tweet from 2nd group : %d", nrow(df[boolG2,]))
sprintf("number of tweet from 3rd group : %d", nrow(df[boolG3,]))

# You can also extract percentage ?
paste0(sprintf("percentage of tweet from 3 groups : %d ", 
        nrow(df[boolG1 | boolG2 | boolG3,])/nrow(df)*100), "%")

You tried to do this with an if condition, it's okay but you'll need to put this in a for loop. R is more efficient when vectorising computation. There is more information in this article.

EDIT

Here is a small code to represent you dataset with a Venn diagram

library(VennDiagram) # you may need to install this package
venn.diagram(
  x = list(g1 = which(boolG1), 
           g2 = which(boolG2), 
           g3 = which(boolG3)),
  filename = 'venn_diagramm.tiff', # be aware it create a file !
)
Gowachin
  • 1,251
  • 2
  • 9
  • 17
  • 1
    Thanks very much, dear. It really helped me a lot to understand. – Majid Ali Mar 27 '21 at 20:28
  • Don't forget to validate the answer in case other have the same issue and I hope you will have great time with R ;) Thinking about vectorisation is a really difficult thing to do first but once you will master it, a lot a problems will be quick to solve ! – Gowachin Mar 27 '21 at 20:39
  • Yeah, I will do that. But when i run the code for Venn-diagram, it always gives me different values... why that so – Majid Ali Mar 27 '21 at 21:04
  • This is because I build `df` by using the `sample` function. This function is sampling 5 time in the vector `c(0,1)` (and replace element in it). It's like flipping a coin, it's random. Using the rest of the code on your dataset shouldn't be a problem, you just need to use your data.frame and not the `df` object. I could have used your data, but this is easier to understand with a smaller data.frame example. – Gowachin Mar 27 '21 at 21:52
  • understood, thanks again. I have another country variable, and now how can I differentiate tweets based on a country variable. – Majid Ali Mar 28 '21 at 12:01
  • It's always the same syntaxe as used before, you need to select rows depending on columns, your first question depended on multiple columns so it was tricky, but now you can just use this kind of code : `data.frame[data.frame$column == value, ]`. Basically you use the column against a certain value (equal, superior etc), that will get you a TRUE/FALSE vector of the rows you select in the data.frame. You just need to replace the column name, the operator and the value. https://stackoverflow.com/questions/2854625/select-only-rows-if-its-value-in-a-particular-column-is-less-than-the-value-in-t – Gowachin Mar 28 '21 at 13:45