-1

I have the following dataframe- Call it A

S.No    Action Taken
1   Advance Booking
2   Before Launch
3   After Launch
4   Re Launch
5   Customer care management

I also have the following dataframe - Call it B

Sl No   Action Name
1       Machine Re Launch
2       New Machine Re Launch
3       New Machine Relaunch
4       New Device Launch
5       New Device After Launch
6       Customer Care Management'
7       Machine After Launch
8       New   Machine After Launch
9       New   Machine Relaunch
10      New   Device After Launch

How to create a column in the dataset B as follows-

 Sl No  Action Name                Action Type
1       Machine Re Launch           Re Launch
2       New Machine Re Launch       Re Launch
3       New Machine Relaunch        Re Launch
4       New Device Launch           Launch
5       New Device After Launch     After Launch
6       Customer Care Management'   Customer Care Management
7       Machine After Launch        After Launch 
8       New   Machine After Launch  After Launch
9       New   Machine Relaunch      After Launch
10      New   Device After Launch   After Launch

How do I accomplish this. It is akin to a look up in Excel.

Vishnu Raghavan
  • 83
  • 1
  • 10
  • 2
    What is the algorithm responsible for pruning "Action name"? What have you tried? Consider posting a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Roman Luštrik Sep 26 '17 at 12:00
  • The action name was provided as part of the data dictionary- I haven't derived it by any code. I have tried Strsplit but that has yielded limited results, as in some cases, the keywords from the dictionary come in between the string. It looks like a lookup but i have only found numerical examples so far. Hence my query – Vishnu Raghavan Sep 26 '17 at 12:10

1 Answers1

0

Is it due to spelling errors in your example that the B dataframe spells it "Relaunch" but the A says "Re Launch" ? I don't see how you expect it to match those two up without any further information.

Assuming it IS an error, you could do something like this,

B$action_type <- ""
for (i in A$action_taken) {
  B$action_type <- ifelse(B$action_type == "",
                                      ifelse(grepl(i,
                                                  B$action_name, 
                                                  ignore.case = TRUE),
                                             i,
                                             ""),
                                      B$action_type)
}

This just iterates through the Actions Taken list, looks to see if it finds the text in Action Name, and if it does, then it outputs that as the Action Type (if not, leaves it empty and moves onto next string). This only finds exact spellings (ignoring case) though, so "Relaunch" and "Re Launch" don't match.

Edit

Adding new response to reflect the comment made below.

If you want to be able to handle all versions of "relaunch"/"re launch" etc, i think you'd have to make a lookup table of all variations you expect, with their corresponding correct "Action Taken" in a second column.

So the A dataframe now has both column action_text_variation and action_taken, where action_text_variation has all texts to look for, and action_taken has the corresponding text you want to fill "action_type" with.

Now we iterate through the number of rows in A.

B$action_type <- ""
for (i in 1:nrow(A)) {
  B$action_type <- ifelse(B$action_type == "",
                                      ifelse(grepl(A$action_text_variation[i],
                                                  B$action_name, 
                                                  ignore.case = TRUE),
                                             A$action_taken[i],
                                             ""),
                                      B$action_type)
}

P.S. it would be a lot easier to help you if you posted a reproducible example so we could run the code ourselves and suggest changes.

  • Thank You. It actually is not a spelling error, but then again I can substitute the same with the correct versions. I have been asked to avoid loops in R as they can be problematic. – Vishnu Raghavan Sep 27 '17 at 03:36
  • for loops can certainly be problematic in R but i think it can be mostly avoided if you understand why they are an issue (like avoiding appending output from a for loop to a data frame). But they are not intrinsically bad and have their use. That's not to say there isn't a better non-loop way to fix your problem :) but this is how i'd do it. I'll edit the original response to reflect an issue where you can handle Relaunch as well as "Re Launch". – Kári Gunnarsson Sep 27 '17 at 13:26
  • Dear Sir. Thank You. And is there a tutorial/ resource for loops and control statements. I have learnt the basics but get foxed with larger loops. – Vishnu Raghavan Sep 28 '17 at 05:24