Some simple setup:
species <- c('Natica','Tellina','Mactra','Natica','Arca','Arca','Tellina',
'Nassarius','Cardium','Cardium')
rspecies <- sample(species)
envirs <- c('Restricted', 'Tidalflat', 'Beach', 'Estuary')
probs <- c(.2, .3, .4, .1)
nrs <- round(length(species) * probs)
Now, a data.frame with separate columns is not a very good way of expressing your data, as your data is not rectangular, i.e. you don't have the same number of observations in each column.
You can either present the data in long form:
df <- data.frame(species = rspecies, envir = rep(envirs, nrs), stringsAsFactors = FALSE)
species envir
1 Tellina Restricted
2 Natica Restricted
3 Arca Tidalflat
4 Mactra Tidalflat
5 Tellina Tidalflat
6 Arca Beach
7 Nassarius Beach
8 Cardium Beach
9 Cardium Beach
10 Natica Estuary
Or as a list:
split(rspecies, df$envir)
$Beach
[1] "Mactra" "Natica" "Arca" "Arca"
$Estuary
[1] "Tellina"
$Restricted
[1] "Nassarius" "Cardium"
$Tidalflat
[1] "Cardium" "Natica" "Tellina"
Edit:
One way to accommodate different number of species, is to make the assignment probabilistic according the environment. This will work better the larger the actual dataset is.
species2 <- c('Natica','Tellina','Mactra','Natica','Arca','Arca','Tellina',
'Nassarius','Cardium','Cardium', 'Cardium')
length(species2)
[1] 11
grps <- sample(envirs, size = length(species2), prob = probs, replace = TRUE)
df2 <- data.frame(species = species2, envir = grps, stringsAsFactors = FALSE)
df2 <- df2[order(df2$envir), ]
species envir
5 Arca Beach
10 Cardium Beach
1 Natica Estuary
11 Cardium Estuary
3 Mactra Restricted
7 Tellina Restricted
2 Tellina Tidalflat
4 Natica Tidalflat
6 Arca Tidalflat
8 Nassarius Tidalflat
9 Cardium Tidalflat