-1

I'm trying to determine which is the first row with a cell that contains only digits, "," "$" in a data frame:

Assessment Area Offices Offices Deposits as of 6/30/16 Deposits as of 6/30/16 Assessment Area Reviews Assessment Area Reviews Assessment Area Reviews
2 Assessment Area       #       %               $ (000s)                      %              Full Scope           Limited Scope                   TOTAL
3     Ohio County       1   50.0%                $24,451                  52.7%                       1                       0                       1
4     Hart County       1   50.0%                $21,931                  47.3%                       1                       0                       1
5         OVERALL       2    100%                $46,382                 100.0%                       2                       0                       2

This code does find the row:

grepl("[0-9]",table_1)

But the code returns:

[1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

I only want to know the row.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    Please provide a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) – Alexlok Aug 27 '20 at 18:09

1 Answers1

0

Your data could use some cleaning up, but it's not entirely necessary in order to solve your problem. You want to find the first row that contains a dollar sign and an appropriate value. My solution does the following:

  1. Iterates over rows
  2. In each row, asks if there's at least one cell that starts with a dollar sign followed by a specific combination of digits and commas (to be explained in greater detail below)
  3. Stops when we reach that row
  4. Prints the ID of the row

The solution involves a for loop, an if statement, and a regular expression.

First of all, here's my attempt to reproduce a data frame. Again the details don't matter too much. I just wanted to make the "money row" the second row which is kind of how it seems to appear in your example

   df<- data.frame(
   Assessment_Area = c(2,3,4,5),
   Offices = c("#",1,1,2),
   Dep_Percent_63016 = c("#","50.0%","50.0%","100.0%"),
   Dep_Total_63016 = c("$ (000s)", "$24,451", "$21,931","$46,382"),
   Assessment_Area_Rev = rep("Blah",4)
   )
   df

                  Assessment_Area Offices Dep_Percent_63016 Dep_Total_63016
          1               2       #                 #        $ (000s)
          2               3       1             50.0%         $24,451
          3               4       1             50.0%         $21,931
          4               5       2            100.0%         $46,382
                 Assessment_Area_Rev
          1                Blah
          2                Blah
          3                Blah
          4                Blah

Here's the for loop:

   library(stringr)
   for (i in 1:nrow(df)) {
   if (any(str_detect(df[i,],"^\\$\\d{1,3}(,\\d{3})*"))) {
   print(i)
   break
     }
    }

The key is the line with the if statement. any returns TRUE if any element of a logical vector is true. In this case the vector is created by applying stringr::str_detect to a row of the df which is indexed as df[i,]. str_detect returns a logical vector - you supply a character vector and an expression to match in the elements of that vector. It returns TRUE or FALSE for each element in the vector which in this case is each cell in a row. So the crux of this is the regular expression: "^\\$\\d{1,3}(,\\d{3})*"

This is the pattern we're searching for (the money cell) in each row. ^\\$ indicates we want the string to start with the dollar sign. The two backslashes escape the $ character because it's a metacharacter in regular expressions (end anchor). We then want 1-3 digits. This will match any dollar value below $1,000. Then we specify that the expression can contain any number (including 0) of , followed by three more digits. This will cover any dollar value.

Finally, if we encounter a row which contains one of these expressions, the for loop will print the number of the row and end the loop so it will return the lowest row number containing one desired cell. In this example the output is 2. If no appropriate rows are encountered, nothing will happen.

There may be more you want to do once you have that information, but if all you need is the lowest row number containing your money expression then this is sufficient.

A less elegant regular expression which only looks for dollar signs, commas, and digits would be:

     "[0-9$,]+"

which is what you asked for although I don't think that's what you really want because that will match something like ,56$,,$$78

RStudious
  • 166
  • 5