I am a beginner data scientist working within an NCAA football program trying to learn R. We record repeated measures data during practice that needs to be organized then analyzed. I have this sample dataset that I am having trouble analyzing due to the variations/typos in the "Activity" column. I need a way to rename/organize the activities in the dataframe.
Raw Dataset looks like this:
Position | Activity | Max_Velocity (mph) |
---|---|---|
WR | Special Team 1 | 20 |
WR | special team 1 | 19 |
WR | Special team 1 | 18 |
WR | special Team 2 | 13 |
WR | ST 2 | 16 |
WR | St 2 | 19 |
WR | team 1 | 12 |
WR | team 1 | 20 |
WR | team 1 | 23 |
WR | team 2 | 21 |
WR | team 2 | 10 |
WR | team 2 | 3 |
WR | team 3 | 21 |
WR | team 3 | 11 |
WR | team 3 | 16 |
WR | Indy 1 | 20 |
WR | indy 2 | 21 |
WR | INDY 3 | 22 |
I need it to look like this:
Position | Activity | Max_Velocity (mph) | Activity_Catagory |
---|---|---|---|
WR | Special Team 1 | 20 | Special Team |
WR | special team 1 | 19 | Special Team |
WR | Special team 1 | 18 | Special Team |
WR | special Team 2 | 13 | Special Team |
WR | ST 2 | 16 | Special Team |
WR | St 2 | 19 | Special Team |
WR | team 1 | 12 | Team |
WR | team 1 | 20 | Team |
WR | team 1 | 23 | Team |
WR | team 2 | 21 | Team |
WR | team 2 | 10 | Team |
WR | team 2 | 3 | Team |
WR | team 3 | 21 | Team |
WR | team 3 | 11 | Team |
WR | team 3 | 16 | Team |
WR | Indy 1 | 20 | Indy |
WR | indy 2 | 21 | Indy |
WR | INDY 3 | 22 | Indy |
This way I can analyze the average of the dependent variable "Max_velocity" during each type of activity. I would rather not make individual lines of code correcting each typo in the "Activity" column. I think this can be done with loops but I am a beginner and don't know where to start. I would greatly appreciate any help to make this a simple fix.