1

I have a dataframe whereby if a condition is satisfied, it looks a couple rows above to find a specific row with another string and replaces itself with that entire cell which had the string.

Here is an Example:

enter image description here

If "test" is detected in the type column, it looks up to 3 rows above for a row which contains the string "Subject". If that condition is satisfied, it then changes it's cell name to the entire cell it found.

The data is :

Type    Score
Ignore  Ignore
Subject_English NA
test    58
test    6
test    87
Subject_science NA
test    23
test    47
test    89
test    45
test    23
ignore  ignore
ignore  ignore
ignore  ignore
test    54
Subject_Math    NA
Ignore  ignore
Ignore  ignore
test    88

Lastly, if multiple rows satisfy the initial condition it takes the latest. Secondly, once a cell has been changed, that cell can be used to satisfy the first condition for another cell. Thirdly, there might be a few rows in between the "test" row and the row Im looking to satisfy the condition I need to ignore. Further notes: The number of tests can vary, it could be 3 or 10. You can have ignores between the subjects. If possible, I'd like to add a threshold that if there wasn't a subject in the last 10 rows, then leave it as test

  • 1
    First convert "test" to NA, something like: `myData$Type <- ifelse(!grepl("^Subject_", myData$Type), NA, myData$Type)` then use the linked post to fill NAs with last nonNA value. – zx8754 Dec 05 '18 at 10:54
  • My only issue with the other post is that the "non_na" column I'd be looking to read isn't necessarily directly before the NA column. There could be unrelated columns in between that I would need to ignore – user8042238 Dec 05 '18 at 11:00
  • Then please provide example data that is representative of your real data. – zx8754 Dec 05 '18 at 11:02
  • 1
    Will do :) Is this good? – user8042238 Dec 05 '18 at 11:07
  • Is it always subject, then 3 rows of test? Or do you have "Ignore" rows between subjects? – zx8754 Dec 05 '18 at 11:52
  • What is the expected output if after row "test 6" we have another rows "xyz 222", then "test 87"? – zx8754 Dec 05 '18 at 11:59
  • The number of tests can vary, it could be 3 or 10. You can have ignores between the subjects. For test87 it would apply the latest subject seen. If possible, I'd liek to add a threshold that if there wasn't a subject in the last 10 rows, then leave it as test – user8042238 Dec 05 '18 at 12:38
  • Yet another condition, please expand your example data to include all conditions, and expected output. – zx8754 Dec 05 '18 at 12:45
  • I'm really sorry, I've updated it once again. Thanks for your help! – user8042238 Dec 05 '18 at 12:55

1 Answers1

1

How to look back a few rows higher to see if a condition is satisfied and then replace a cell accordingly

Trying to "look back" is certainly possible, but it adds a lot of complexity. Instead, iterate over the rows from the top down. Whenever you come across a cell whose value starts with "Subject", set a variable current_subject to that value. Whenever you come across a cell whose value is "test", set the value of that cell to the value in current_subject.

I'd like to add a threshold that if there wasn't a subject in the last 10 rows, then leave it as test

OK, so create another variable called nonsubject_rows. Set it to 0 whenever you come across a subject row, and perhaps also whenever you set a test row to the current subject; otherwise, increment it for each row you visit. Before you set the value of a test row, check to see whether nonsubject_rows is greater than the threshold; if it is, don't change the row.

In short, you can keep track of the information you need in order to process each row as you go. That will be a lot simpler and faster than searching for the information every time you need it.

Caleb
  • 124,013
  • 19
  • 183
  • 272