1

I'm trying to make a list containing 25 different passwords to check against another list of 50, and come back with the matches. This is for a university project on passwords. The idea is the list of 25 are the most commonly used passwords, and I would like R to tell me which of my 50 passwords match the most common 25. However I keep receiving the following error:

Error in $<-.data.frame(*tmp*, "Percent", value = character(0)) :
replacement has 0 rows, data has 25

I am using the following code

makeCounts <- function(x) {
  return(x=list("count"=sum(grepl(x, Final_DF$pswd, ignore.case=TRUE))))  
}

#creates a local variable named tmp which is removed afterwards
printCounts <- function(ct) {
  tmp <- data.frame(Term=names(ct), Count=as.numeric(unlist(ct)))
  tmp$Percent <- sprintf("%3.2f%%", ((tmp$Count / nrow(Final_DF$Pswd) * 100)))
  print(tmp[order(-tmp$Count),], row.names=FALSE)
}

# create top 25 mostly commonly used pswds

worst.pass <- c("password", "123456", "12345678", "qwerty", "abc123", 
                "monkey", "1234567", "Qwertyuiop", "123", "dragon", 
                "000000", "1111111", "iloveyou", "1234", "12345", 
                "1234567890", "1q2w3e4r5t", "ashely", "shadow", "123123", 
                "654321", "superman", "sunshine", "tinkle", "football")

worst.ct <- sapply(worst.pass, makeCounts, simplify=FALSE)
printCounts(worst.ct)

The data containing my 50 passwords are is contained in my data frame Final_DF$Pswd and is as follows

> Final_DF$Pswd
 [1] "monkey"       "iloveyou"     "dragon"       "jbI2pnK$xi"   "password"     "computer"     "!qessw"      
 [8] "tUNh&SSm6!"   "sunshine"     "wYrUeWV"      "superman"     "samsung"      "utoXGe6$"     "master"      
[15] "wjZC&OvXX"    "0R1cNTm9sGir" "Fbuu2bs89?"   "pokemon"      "secret"       "x&W1TjO59"    "buster"      
[22] "purple"       "shine"        "flower"       "marina"       "Tg%OQT$0"     "SbDUV&nOX"    "peanut"      
[29] "angel"        "?1LOEc4Zfk"   "computer"     "spiderman"    "nothing"      "$M6LgmQgv$"   "orange"      
[36] "knight"       "american"     "outback"      "TfuRpt3PiZ"   "air"          "surf"         "lEi2a$$eyz"  
[43] "date"         "V$683rx$p"    "newcastle"    "estate"       "foxy"         "ginger"       "coffee"      
[50] "legs" 

Show traceback of the error when I run printCounts(worst.ct) reads

 Error in `$<-.data.frame`(`*tmp*`, "Percent", value = character(0)) : 
  replacement has 0 rows, data has 25 
4.
stop(sprintf(ngettext(N, "replacement has %d row, data has %d", 
    "replacement has %d rows, data has %d"), N, nrows), domain = NA) 
3.
`$<-.data.frame`(`*tmp*`, "Percent", value = character(0)) 
2.
`$<-`(`*tmp*`, "Percent", value = character(0)) 
1.
printCounts(worst.ct) 

I have read a couple of forum posts, and I am not sure if this has something to do with NA values? I am new to R and been looking at this for some time scratching my head.

Can anybody please tell me where I am going wrong?

> dput(Final_DF)
structure(list(gender = c("female", "male", "male", "female", 
"female", "male", "male", "male", "male", "female", "male", "male", 
"female", "female", "female", "female", "male", "female", "male", 
"male", "female", "female", "female", "female", "female", "female", 
"male", "female", "female", "female", "female", "female", "female", 
"female", "male", "male", "female", "female", "male", "female", 
"female", "male", "female", "female", "male", "male", "male", 
"male", "male", "male"), age = structure(c(47L, 43L, 65L, 24L, 
44L, 60L, 26L, 25L, 62L, 23L, 44L, 61L, 27L, 47L, 18L, 23L, 34L, 
77L, 71L, 19L, 64L, 61L, 22L, 55L, 45L, 29L, 21L, 64L, 43L, 20L, 
32L, 55L, 68L, 21L, 81L, 43L, 63L, 72L, 38L, 20L, 66L, 39L, 64L, 
20L, 73L, 21L, 53L, 75L, 69L, 82L), class = c("variable", "integer"
), varname = "Age"), web_browser = structure(c(1L, 1L, 4L, 1L, 
3L, 3L, 2L, 1L, 4L, 1L, 1L, 1L, 3L, 4L, 1L, 2L, 1L, 3L, 3L, 2L, 
1L, 1L, 1L, 3L, 4L, 3L, 4L, 4L, 1L, 2L, 1L, 1L, 3L, 1L, 1L, 2L, 
1L, 2L, 3L, 4L, 2L, 3L, 1L, 1L, 1L, 1L, 3L, 3L, 4L, 1L), .Label = c("Chrome", 
"Internet Explorer", "Firefox", "Netscape"), class = c("variable", 
"factor"), varname = "Browser"), Pswd = c("monkey", "iloveyou", 
"dragon", "jbI2pnK$xi", "password", "computer", "!qessw", "tUNh&SSm6!", 
"sunshine", "wYrUeWV", "superman", "samsung", "utoXGe6$", "master", 
"wjZC&OvXX", "0R1cNTm9sGir", "Fbuu2bs89?", "pokemon", "secret", 
"x&W1TjO59", "buster", "purple", "shine", "flower", "marina", 
"Tg%OQT$0", "SbDUV&nOX", "peanut", "angel", "?1LOEc4Zfk", "computer", 
"spiderman", "nothing", "$M6LgmQgv$", "orange", "knight", "american", 
"outback", "TfuRpt3PiZ", "air", "surf", "lEi2a$$eyz", "date", 
"V$683rx$p", "newcastle", "estate", "foxy", "ginger", "coffee", 
"legs"), pswd_length = c(6L, 8L, 6L, 10L, 8L, 8L, 6L, 10L, 8L, 
7L, 8L, 7L, 8L, 6L, 9L, 12L, 10L, 7L, 6L, 9L, 6L, 6L, 5L, 6L, 
6L, 8L, 9L, 6L, 5L, 10L, 8L, 9L, 7L, 10L, 6L, 6L, 8L, 7L, 10L, 
3L, 4L, 10L, 4L, 9L, 9L, 6L, 4L, 6L, 6L, 4L), last.num = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, 9, NA, NA, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA)), row.names = c(NA, -50L), class = "data.frame")
Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
Marty
  • 75
  • 1
  • 6
  • You should provide `Final_DF` using `dput`, read: https://stackoverflow.com/a/5963610/6574038 – jay.sf Dec 26 '20 at 15:45
  • Likely candidates of a duplicate, [`[r] replacement has 0 rows`](https://stackoverflow.com/search?q=%5Br%5D+replacement%20has%200%20rows). – r2evans Dec 26 '20 at 15:50
  • Should `nrow(Final_DF$Pswd)` instead be `nrow(Final_DF)`? If it is a simple column, then that it guaranteed to return `NULL`. – r2evans Dec 26 '20 at 15:51
  • BTW, having your function rely on an external variable neither defined within nor passed to the function is a bad idea: it breaks reproducibility, and it can make troubleshooting difficult. One quick option is to add an argument to `printCounts(ct, finaldf)` and call it with `printCounts(worst.ct, Final_DF)`. – r2evans Dec 26 '20 at 15:53
  • @r2evans that makes a difference in that it prints the term, count and percentage but it is not recognising the matches just appears with 0 count for all – Marty Dec 26 '20 at 15:56
  • `Final_DF$pswd` should be `Final_DF$Pswd`? – pseudospin Dec 26 '20 at 16:03

2 Answers2

2

There are several things that appear wrong with your functions.

  1. makeCounts is referencing pswd, but Final_DF has Pswd and pswd_length. R is doing a partial match for, and I'm guessing that it is not the one you want. Let's prove what it is using, first by setting an option[1]:

    options(warnPartialMatchDollar = TRUE) # see ?options
    worst.ct <- sapply(worst.pass, makeCounts, simplify=FALSE)
    # Warning in Final_DF$pswd : partial match of 'pswd' to 'pswd_length'
    # Warning: partial match of 'pswd' to 'pswd_length'
    # Warning: partial match of 'pswd' to 'pswd_length'
    # Warning: partial match of 'pswd' to 'pswd_length'
    # Warning: partial match of 'pswd' to 'pswd_length'
    ### ...repeated...
    

    Worse, if you look at this variable (part of troubleshooting your problem is to check the variables you are making and using), you'll see that it is effectively empty/useless, where all values are 0:

    str(worst.ct)
    # List of 25
    #  $ password  :List of 1
    #   ..$ count: int 0
    #  $ 123456    :List of 1
    #   ..$ count: int 0
    #  $ 12345678  :List of 1
    #   ..$ count: int 0
    #  $ qwerty    :List of 1
    #   ..$ count: int 0
    ### ...truncated...
    

    If you change your function to use the correct column name, it provides no such warning, and it does contain some non-zero elements:

    makeCounts <- function(x) {
      return(x=list("count"=sum(grepl(x, Final_DF$Pswd, ignore.case=TRUE))))  
    }
    table(unlist(worst.ct))
    #  0  1 
    # 19  6 
    
    str(worst.ct)
    # List of 25
    #  $ password  :List of 1
    #   ..$ count: int 1
    #  $ 123456    :List of 1
    #   ..$ count: int 0
    #  $ 12345678  :List of 1
    #   ..$ count: int 0
    #  $ qwerty    :List of 1
    #   ..$ count: int 0
    ### ...truncated...
    
  2. Within your printCounts function, you are referencing nrow(Final_DF$Pswd), which is always going to produce NULL. Have you tried this?

    nrow(Final_DF$Pswd)
    # NULL
    nrow(Final_DF)
    # [1] 50
    

    Instead, rewrite that line to be

      tmp$Percent <- sprintf("%3.2f%%", ((tmp$Count / nrow(Final_DF) * 100)))
    
  3. Not a syntax error, but your function relying on a variable that is neither defined within it nor passed to it is bad practice: it means the function can behave differently when the same parameters are passed to it, which breaks reproducibility (and it can make troubleshooting rather difficult).

    I suggest making Final_DF an argument for the function, and passing it every time.

    printCounts <- function(ct, Final_DF) {
      tmp <- data.frame(Term=names(ct), Count=as.numeric(unlist(ct)))
      tmp$Percent <- sprintf("%3.2f%%", ((tmp$Count / nrow(Final_DF) * 100)))
      print(tmp[order(-tmp$Count),], row.names=FALSE)
    }
    
    printCounts(worst.ct)
    # Error in nrow(Final_DF) : argument "Final_DF" is missing, with no default
    
    printCounts(worst.ct, Final_DF) # no error here
    

    For this case, I'm recommending that you do not provide a default value for it. This also enabled you to use the same function with different "final" frames of passwords, in case you are testing (unit-testing) or testing (train/test sampling) or testing (troubleshooting).

After those changes, I get this:

printCounts(worst.ct, Final_DF)
#        Term Count Percent
#    password     1   2.00%
#      monkey     1   2.00%
#      dragon     1   2.00%
#    iloveyou     1   2.00%
#    superman     1   2.00%
#    sunshine     1   2.00%
#      123456     0   0.00%
#    12345678     0   0.00%
#      qwerty     0   0.00%
#      abc123     0   0.00%
#     1234567     0   0.00%
#  Qwertyuiop     0   0.00%
#         123     0   0.00%
#      000000     0   0.00%
#     1111111     0   0.00%
#        1234     0   0.00%
#       12345     0   0.00%
#  1234567890     0   0.00%
#  1q2w3e4r5t     0   0.00%
#      ashely     0   0.00%
#      shadow     0   0.00%
#      123123     0   0.00%
#      654321     0   0.00%
#      tinkle     0   0.00%
#    football     0   0.00%

Note:

  1. I have options(warnPartialMatchDollar=TRUE, warnPartialMatchAttr=TRUE) set in my ~/.Rprofile (and any project-specific .Rprofile init file) for just this reason: the $ silently does partial matching, and this can be very problematic. With the warning, at least you can see what R is inferring in the background. There is a third option, warnPartialMatchArgs, that has the same intent ... but waaaaaaaaaay too many package authors out there are inadvertently relying on this behavior, so lacking the time/ability to fix them all, I have chosen to muffle this noise-maker.

    Especially if this partial-matching behavior is a surprise to you, I strongly encourage you to set the first two options yourself. In the best-case, it produces no warnings and you have the comfort of knowing that you are taking steps to produce more resilient code; at worst, it is noisy and you eventually get tired of the noise and fix the lazy code.

    See ?options for these three among many other available options. (Packages can set their own options as well; an option is similar in concept to Windows' registry, for better or worse, in that it is global to R, and can have arbitrary keys and values.)

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Thanks, your changes worked, in summary to check my understanding this was all down to duplicate pswd names in the one DF? `options(warnPartialMatchDollar = TRUE)' what does this line of code do, is it helping us to pin point the problem? – Marty Dec 26 '20 at 16:41
  • (1) You had two errors in your functions. (2) That option set helps identify where one of the problems is, the other error was evident when you step line-by-line through your code. To learn more about that option, see [`?options`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/options.html) and search for `warnPartialMatchDollar`; I could copy/paste here, but it's best to find the authoritative docs. – r2evans Dec 26 '20 at 16:43
1

If you only want to check whether a (set of) password(s) is in a set of bad passwords, you could use

Final_DF$Pswd %in% worst.pass

This will give you a vector of TRUE or FALSE. you could run sum(Final_DF$Pswd %in% worst.pass) to get the total number of bad password matches, or table(Final_DF$Pswd[Final_DF$Pswd %in% worst.pass]) for a quick overview of matches.

However, if your intention is to check a set where passwords are constantly added (which I'm guessing is the intention, since you made the functions), the following might be useful:

result <- c()
for (i in 1:length(Final_DF$Pswd)) {
    if (Final_DF$Pswd[i] %in% worst.pass) {
        result[i] <- which(worst.pass == Final_DF$Pswd[i])
    } else
        result[i] <- NA
}
table(worst.pass[result[!is.na(result)]])

The results is a table with the count of the matches. In your case,

  dragon iloveyou   monkey password sunshine superman 
       1        1        1        1        1        1 

Note that for large amount of passwords looping is not advisable. In that case, neat tidyverseapproaches would be worth looking at.

Donald Seinen
  • 4,179
  • 5
  • 15
  • 40