1

I've looked at other threads and tried to apply it to my code but have had no luck.

CDR3_post_challenge_unique_clonecount$participant_per_cdr3aa <- as.numeric(CDR3_post_challenge_unique_clonecount$cdr3aa)
participant_list <- unique(CDR3_post_challenge_unique_clonecount$cdr3aa)
for (c in participant_list)
{
  CDR3_post_challenge_unique_clonecount$participant_per_cdr3aa[CDR3_post_challenge_unique_clonecount$cdr3aa == c] <- length(unique(CDR3_post_challenge_unique_clonecount$PartID[CDR3_post_challenge_unique_clonecount$cdr3aa == c]))
}

Here is a bit of the dataframe:

cdr3aa              clonecount  PartID
CAAGRAARGGSVPHWFDPF 1           S-1
CAALADSGSQTDAFDIA   1           S-1
CAFHAAYGSQHGLDVW    1           S-1
CAGGLAWLVDDW        1           S-1
CAGRWFFPW           1           S-1
CAGVKNGRGMDVW       1           S-1
Chinemerem
  • 21
  • 6
  • 3
    Hi! Can you provide the sample of your data frame and participant_list with ```dput ()``` because, without that, it is hard to understand what we are dealing with. Read more about making reproducible examples here: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Shibaprasadb Sep 03 '21 at 16:12

1 Answers1

3

I think you can replace the for loop with

CDR3_post_challenge_unique_clonecount$per3 <-
  as.integer(
    ave(CDR3_post_challenge_unique_clonecount$PartID,
        CDR3_post_challenge_unique_clonecount$cdr3aa,
        FUN = function(z) length(unique(z)))
  )

I'll demonstrate with mtcars, using the follow analogs:

  • mtcars --> CDR3_post_challenge_unique_clonecount
  • cyl --> cdr3aa, the categorical variable in which we want to count PartID
  • drat --> PartID, the thing we want to count (uniquely) within each cdr3aa
mtcars$drat_per_cyl <- ave(mtcars$drat, mtcars$cyl, FUN = function(z) length(unique(z)))
mtcars
#                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb drat_per_cyl
# Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4            5
# Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4            5
# Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1           10
# Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1            5
# Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2           11
# Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1            5
# Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4           11
# Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2           10
# Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2           10
# Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4            5
# Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4            5
# Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3           11
# Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3           11
# Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3           11
# Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4           11
# Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4           11
# Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4           11
# Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1           10
# Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2           10
# Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1           10
# Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1           10
# Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2           11
# AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2           11
# Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4           11
# Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2           11
# Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1           10
# Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2           10
# Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2           10
# Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4           11
# Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6            5
# Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8           11
# Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2           10

Notes:

  • ave is a little brain-dead in that the class of the return value is always the same as the class of the first argument. This means that one cannot count unique "character" and expect to get an integer, it is instead returned as a string. It's because of this that I wrap ave in as.integer(.).

  • ave returns a vector the same length as the input, with values corresponding 1-for-1 (meaning the order is relevant and preserved). In my example of mtcars, this means that it is effectively doing something like this:

    ind4 <- which(mtcars$cyl == 4L)
    ind4
    #  [1]  3  8  9 18 19 20 21 26 27 28 32
    length(unique(mtcars$drat[ind4]))
    # [1] 10
    ind6 <- which(mtcars$cyl == 6L)
    ind6
    # [1]  1  2  4  6 10 11 30
    length(unique(mtcars$drat[ind6]))
    # [1] 5
    ### ...
    

    but it will place the return value 10 in the ind4 positions of the return value. For example, because of my ind6, the return value will start with

    c(5, 5, .., 5, .., 5, .., .., .., 5, 5, .., .....)
    

    Because of ind4, it will contain

    c(.., .., 10, .., .., .., .., 10, 10, .....)
    

    (And same for cyl==8L.)

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Hi, I've tried this but end up getting an output of NA for each row. – Chinemerem Sep 03 '21 at 16:47
  • That's very interesting. – r2evans Sep 03 '21 at 16:51
  • I get this warning: ```50: In `[<-.factor`(`*tmp*`, i, value = 1L) : invalid factor level, NA generated``` – Chinemerem Sep 03 '21 at 16:51
  • 2
    Okay. While I completely understand *"I am not allowed to"* share sample data, do you understand how there is very little I can do without something reproducible? Either find a way to anonymize your data (no need to make it reversible, you just need to de-identity it perhaps) or find a way to produce similar-enough (i.e., *representative*) fake sample data that we can use. Good luck. – r2evans Sep 03 '21 at 16:54
  • I've added a sample to the post – Chinemerem Sep 03 '21 at 17:18
  • Good start, but ... it doesn't error, and it returns all `1`s. – r2evans Sep 03 '21 at 17:20