-1

I conducted a diary study in which for 5 days, participants had to answer to 2 times.

My criteria was that people had to answer to at least 3 full days out of the 5. So, that from the overall 10 times in which the questionnaire took place, they had to answer to at least 6 times. Everytime they filled in the questionnaire they had to put a personal code, which is why I can see who answered and how many times.

I put like this:

Morning_Afternoon_PT_EN: is the name of the database

respfreq <- calc.nomiss(Morning_Afternoon_PT_EN$day, tolower(Morning_Afternoon_PT_EN$code), data=Morning_Afternoon_PT_EN)
print(respfreq)

enter image description here

   952345172    alju12    amou79    amou91    baab81 
        0         5        10        10        10        10 
   base85    beju58    cade61    caju21    chno45    crju09 
       10        10        10        10         5         7 
   faap52    fuau48    fude38    fuma07    huju03    leja26 
       10         8         3        10         8        10 
   leju40    lema32    leno81    liab14    liab20    liab50 
       10         9         8         9        10         9 
  liabr14    liag30    liag60   liap520    liau35    lide50 
        1        10         9        10         9         9 
   life10    life74    lija05    lija45    lija78    liju65 
        9         1        10        10         9        10 
   liju94    lima40    lima82    limf96    lioc46    lioc84 
        9        10        10         4        10        10 
   lise50    lise88    maab31    moag91    moap58    pode04 
        9        10        10        10         9         8 
   sade61    saja28    saja79    saoc06    sema72    sema83 
        9        10        10         9        10        10 
   tose37    vima32 
        9         9 
length(respfreq)
[1] 56

So, I see that "952345172", "chno45", "limf96","liabr14","life74", "fude38" do not meet the requiremente and I want to eliminate them from the overall data base.

I tried to use subset, like:

NewDataFrame<-subset(Morning_Afternoon_PT_EN, respfreq>6)

But, I get the answer:

NewDataFrame<-subset(Morning_Afternoon_PT_EN, respfreq>6)

Error: Must subset rows with a valid subscript vector. i Logical subscripts must match the size of the indexed input. x Input has size 485 but subscript r has size 56.

I understand the error, but I don't know how to solve it.

Jose
  • 421
  • 3
  • 10
  • 2
    Please do not post photos of data or code! If you do, people who are willing to help you would have to type out all that text. Instead provide a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) P.S. Here is [a good overview on how to ask a good question](https://stackoverflow.com/help/how-to-ask) – dario Dec 03 '21 at 11:06

1 Answers1

0

You should include the column with the counts in the dataframe in order to use subset

x <- c("952345172", "alju12", "amou79", "amou91", "baab81", NA)

code <- rep(x, c(5, 10, 10, 20, 2, 7))

df <- data.frame(id = 1:length(code), code)

head(df)

##   id      code
## 1  1 952345172
## 2  2 952345172
## 3  3 952345172
## 4  4 952345172
## 5  5 952345172
## 6  6    alju12
library(dplyr)

df2 <- left_join(df, na.omit(df) |> count(code)) 

df2 <- subset(df2, n > 6)

head(df2)

##    id   code  n
## 6   6 alju12 10
## 7   7 alju12 10
## 8   8 alju12 10
## 9   9 alju12 10
## 10 10 alju12 10
## 11 11 alju12 10

Another option is to use:

tabc <- table(df$code)

df3 <- df[df$code %in% names(tabc[tabc > 6 ]), ]
Jose
  • 421
  • 3
  • 10