1

I have a data frame that looks like this:

                  CEMETERY CONTEXT    SEX BONE MEASUREMENT VALUE
1 Medieval-St. Mary Graces    6225   MALE           HuE1 L  64.1
2 Medieval-St. Mary Graces    6225   MALE           HuE1 R  62.7
3 Medieval-St. Mary Graces    6225   MALE           HuHD L  50.1
4 Medieval-St. Mary Graces    6225   MALE           HuHD R  51.3
5 Medieval-St. Mary Graces    6225   MALE           HuL1 R 346.0
6 Medieval-St. Mary Graces    6272 FEMALE           HuHD L  41.3

I need to remove any specimens (CONTEXTs) where there is only a bone measurement for left (L) or (R), instead of having both (e.g. if a specimen has HuE1L but not HuE1R then I need to remove it). I'm not sure what the best way to do this is as the data frame is too large to individually remove certain rows. To create this data frame I used the merge() function so I also have data frames for each bone (left and right are in separate data frames), if that makes any difference to what I need to do?

EDIT: I tried using data.table:

library(data.table)
setDT(df)
setkey(df, CONTEXT, BONE)
df[df[, .N, key(df)][N == 2, .(CONTEXT, BONE)]]

but that returns this:

                     CEMETERY CONTEXT    SEX EXPANSION VALUE
  1:   Medieval-Spital Square      19 FEMALE    HuE1 L  57.9
  2:   Medieval-Spital Square      19 FEMALE    HuE1 R  58.8
  3:   Medieval-Spital Square      19 FEMALE    HuHD R  44.6
  4:   Medieval-Spital Square      19 FEMALE    HuL1 L 326.0
  5:   Medieval-Spital Square      19 FEMALE    HuL1 R 332.0

474: Medieval-St. Mary Graces   16332   MALE    RaHD L  25.4
475: Medieval-St. Mary Graces   16344   MALE    HuHD R  48.8
476: Medieval-St. Mary Graces   20001 FEMALE    HuHD L  40.2
477: Medieval-St. Mary Graces   20001 FEMALE    HuHD R  39.8
478: Medieval-St. Mary Graces   20001 FEMALE    RaHD R  20.8

so it hasn't actually removed bone measurements that only have left or right. To clarify - the Ls and Rs are part of the column 'EXPANSION', not a separate column - would I first need to make that a column on its own/how would I go about doing this?

1 Answers1

1

You can subset you dataset using data.table:

library(data.table)
setDT(df)
setkey(df, CONTEXT, BONE)
df[df[, .N, key(df)][N == 2, .(CONTEXT, BONE)]]

#                   CEMETERY CONTEXT  SEX BONE MEASUREMENT VALUE
# 1: Medieval-St. Mary Graces    6225 MALE HuE1           L  64.1
# 2: Medieval-St. Mary Graces    6225 MALE HuE1           R  62.7
# 3: Medieval-St. Mary Graces    6225 MALE HuHD           L  50.1
# 4: Medieval-St. Mary Graces    6225 MALE HuHD           R  51.3

Explanation:

  1. Turn your data into a data.table (setDT())
  2. Set key (index) in your data (setkey()). Using setkey(df, CONTEXT, BONE) as we want to count by CONTEXT and BONE
  3. Count number of rows by key (df[, .N, key(df)])
  4. Subset data with 2 occurrences (N == 2)
pogibas
  • 27,303
  • 19
  • 84
  • 117
  • thank you! Do I need to download the data.table package? It tells me there is no package called 'data.frame' when I put it in – Aisling Murray Dec 23 '17 at 10:12
  • @AislingMurray `install.packages("data.table")`. You should look into if you're planing to work with large datasets – pogibas Dec 23 '17 at 10:13
  • I would do it differently in `data.table` ;-) (see the linked duplicate target) OTOH: this is a nice approach too – Jaap Dec 23 '17 at 10:18
  • @Jaap Thanks! Next time :-) – pogibas Dec 23 '17 at 10:20
  • I did the steps above and it hasn't worked properly - there are still instances where specimens only have the left or right measurement - I'm not sure if this is because many of them have left and right for some bones but then only one side for other bones? – Aisling Murray Dec 23 '17 at 10:23
  • @PoGibas please see my edited question as I realise it wasn't clear what each column contains and it affects how I edit the data – Aisling Murray Dec 23 '17 at 13:46