1

I'm trying to generate a column in my data frame, let us say it is called 'Status' and it should provide the status of the fish i.e. protected or unprotected.

What I am looking for:

  ID                   Species      Status
1  1 Epinephelus polyphekadion   Protected
2  2        Epinephelus tukula   Protected
3  3         Thunnus albacares   Protected
4  4       Sphyraena barracuda Unprotected
5  5        Lutjanus rivulatus Unprotected
6  6         Lethrinus lentjan Unprotected
7  7 Plectropomus pessuliferus   Protected

My Data:

fishydata <- structure(list(ID = 1:7, Species = structure(c(1L, 2L, 7L, 6L,4L, 3L, 5L), .Label = c("Epinephelus polyphekadion", "Epinephelus tukula","Lethrinus lentjan", "Lutjanus rivulatus", "Plectropomus pessuliferus","Sphyraena barracuda", "Thunnus albacares"), class = "factor"), .Names = c("ID", "Species"), row.names = c(NA, 7L), class = "data.frame")

The data set contains over a 1000 observations. Is their a line of code that can link specific species to a status in a new column.

I have over 40 species and 7 are protected. I'm looking to provide the 7 species with a 'Protected' status and disregard everything else as 'Unprotected' rather than typing out all the species names and classing them as 'Unprotected'

Any pointers or advice would be greatly appreciated. My skills are basic, attempting to get back into R. I've been dabbling in dplyr using mutate and filter but I've reached a brick wall.

Beaver
  • 95
  • 2
  • 8
  • can you clarify what your input is? Your data already appears to have the factor you're looking for.. – De Novo Mar 05 '18 at 13:55
  • You can use join if you have a lookup table for protected species. – www Mar 05 '18 at 14:10
  • @DanHall sorry I will update and remove the Protected column. The answer by Cédric Miachon is what I'm trying to achieve, maybe in a one liner is possible. – Beaver Mar 06 '18 at 14:46

2 Answers2

4

Your data without the Status column :

fishydata2 <- structure(list(ID = 1:7, 
                            Species = structure(c(1L, 2L, 7L, 6L,4L, 3L, 5L), 
                            .Label = c("Epinephelus polyphekadion", "Epinephelus tukula","Lethrinus lentjan", "Lutjanus rivulatus", "Plectropomus pessuliferus","Sphyraena barracuda", "Thunnus albacares"), class = "factor")
                        ),
                   .Names = c("ID", "Species"), 
                   row.names = c(NA, 7L), 
                   class = "data.frame")

#   ID                   Species
#1  1 Epinephelus polyphekadion
#2  2        Epinephelus tukula
#3  3         Thunnus albacares
#4  4       Sphyraena barracuda
#5  5        Lutjanus rivulatus
#6  6         Lethrinus lentjan
#7  7 Plectropomus pessuliferus

You just have to create a new column with an Unprotected status by default:

fishydata2$Status <- "Unprotected"

And now, just update it for your only 7 protected species:

fishydata2[fishydata2$Species %in% c('Epinephelus polyphekadion',
                  'Epinephelus tukula','Thunnus albacares',
                  'Plectropomus pessuliferus'),]$Status <- "Protected"

Results:

fishydata2
#ID                   Species      Status
#1  1 Epinephelus polyphekadion   Protected
#2  2        Epinephelus tukula   Protected
#3  3         Thunnus albacares   Protected
#4  4       Sphyraena barracuda Unprotected
#5  5        Lutjanus rivulatus Unprotected
#6  6         Lethrinus lentjan Unprotected
#7  7 Plectropomus pessuliferus   Protected
Cédric Miachon
  • 344
  • 1
  • 8
  • Just like that. Is this a get around or a standard way of achieving this. Thank you very much for the swift answer. – Beaver Mar 06 '18 at 14:44
  • would there be a possible way to 'cheat' and class 'Epinephelus' together instead of writing out each species name i.e. E.tukula E.polyphekadion.... Sorry if this constitutes as another question I will post if so. B – Beaver Mar 06 '18 at 15:18
  • As there are only a few species, I would say it's a standard way to achieve this. – Cédric Miachon Mar 07 '18 at 15:55
  • @Beaver for your second question, the best way is to have a dedicated column on the Genus (i.e Epinephelus), and the other one on the Species. If not possible, you could have a look here: https://stackoverflow.com/questions/5823503/pattern-matching-using-a-wildcard – Cédric Miachon Mar 07 '18 at 16:01
  • thanks @Cédric Miachon I found an issue with setting 'unprotected' as a default value for the status column. If a row in the data frame where a fish is not caught and the species column is empty = the status column will show 'unprotected' for no fish. NA would be more correct. – Beaver Mar 08 '18 at 11:31
  • @Beaver, You can do it like this : fishydata2$Status <- NA then fishydata2[fishydata2$Species != "",]$Status <- "Unprotected" – Cédric Miachon Mar 09 '18 at 11:42
1

If you're just wondering how to subset the data frame so that you only have the rows that have a value of Protected, here are two options:

With dplyr

filter(fishydata, Status == "Protected")
#   ID                   Species    Status
# 1  1 Epinephelus polyphekadion Protected
# 2  2        Epinephelus tukula Protected
# 3  3         Thunnus albacares Protected
# 4  7 Plectropomus pessuliferus Protected

Base

fishydata[fishydata$Status == "Protected",]
#   ID                   Species    Status
# 1  1 Epinephelus polyphekadion Protected
# 2  2        Epinephelus tukula Protected
# 3  3         Thunnus albacares Protected
# 7  7 Plectropomus pessuliferus Protected

Both of these options will produce a dataframe that has only those rows corresponding to protected species. If you want to use it later, you could assign it to protected_fish, e.g., protected_fish <- filter(fishydata, Status == "Protected"). I would advise against creating a new column in fishydata that contains only the species that have a protected status. You already have all that information in your data frame. If you just want to see the Species names, you can extract it as a vector with protected_fish$Species, or using a pipe command like filter(fishydata, Status == "Protected") %>% select(Species)

De Novo
  • 7,120
  • 1
  • 23
  • 39