Your data could use some cleaning up, but it's not entirely necessary in order to solve your problem. You want to find the first row that contains a dollar sign and an appropriate value. My solution does the following:
- Iterates over rows
- In each row, asks if there's at least one cell that starts with a dollar sign followed by a specific combination of digits and commas (to be explained in greater detail below)
- Stops when we reach that row
- Prints the ID of the row
The solution involves a for
loop, an if
statement, and a regular expression.
First of all, here's my attempt to reproduce a data frame. Again the details don't matter too much. I just wanted to make the "money row" the second row which is kind of how it seems to appear in your example
df<- data.frame(
Assessment_Area = c(2,3,4,5),
Offices = c("#",1,1,2),
Dep_Percent_63016 = c("#","50.0%","50.0%","100.0%"),
Dep_Total_63016 = c("$ (000s)", "$24,451", "$21,931","$46,382"),
Assessment_Area_Rev = rep("Blah",4)
)
df
Assessment_Area Offices Dep_Percent_63016 Dep_Total_63016
1 2 # # $ (000s)
2 3 1 50.0% $24,451
3 4 1 50.0% $21,931
4 5 2 100.0% $46,382
Assessment_Area_Rev
1 Blah
2 Blah
3 Blah
4 Blah
Here's the for
loop:
library(stringr)
for (i in 1:nrow(df)) {
if (any(str_detect(df[i,],"^\\$\\d{1,3}(,\\d{3})*"))) {
print(i)
break
}
}
The key is the line with the if
statement. any
returns TRUE
if any element of a logical vector is true. In this case the vector is created by applying stringr::str_detect
to a row of the df which is indexed as df[i,]
. str_detect
returns a logical vector - you supply a character vector and an expression to match in the elements of that vector. It returns TRUE
or FALSE
for each element in the vector which in this case is each cell in a row. So the crux of this is the regular expression:
"^\\$\\d{1,3}(,\\d{3})*"
This is the pattern we're searching for (the money cell) in each row. ^\\$
indicates we want the string to start with the dollar sign. The two backslashes escape the $
character because it's a metacharacter in regular expressions (end anchor). We then want 1-3 digits. This will match any dollar value below $1,000. Then we specify that the expression can contain any number (including 0) of ,
followed by three more digits. This will cover any dollar value.
Finally, if we encounter a row which contains one of these expressions, the for
loop will print the number of the row and end the loop so it will return the lowest row number containing one desired cell. In this example the output is 2
. If no appropriate rows are encountered, nothing will happen.
There may be more you want to do once you have that information, but if all you need is the lowest row number containing your money expression then this is sufficient.
A less elegant regular expression which only looks for dollar signs, commas, and digits would be:
"[0-9$,]+"
which is what you asked for although I don't think that's what you really want because that will match something like ,56$,,$$78