Error: Can't cast > to after using new pivot_wider() function in tidyr

Question

I have a large long dataset of pathology results. Each patient has a unique identifier (in this case row_id. For each patient they have had samples taken on a particular date (sample_date). The range of tests that they have had is very varied and with a heterogeneous output (some with character strings, and some numeric). Also, not every patient has had every test performed at each sample_date, so there should be quite a few NAs.

The name of the test performed is in the test_name column, and the result is in the result column. I would like to make this into a wide dataset, spreading the result column using the test_name as the column headers, but keeping the identifiers as both row_id and sample_date.

The new pivot_wider() function in tidyr seems to be perfect for my needs, and when I run it, it gives me the type of data frame I need (i.e., rows are still identified by row_id and sample_date but now there are columns for each test_name and results therein.

Here is a small sample of my dataset:

structure(list(row_id = 1:81, sample_date = structure(c(16444, 
16444, 16444, 16444, 16444, 16444, 16444, 16444, 16444, 16444, 
16444, 16444, 16444, 16444, 16444, 16444, 16444, 16444, 16444, 
16444, 16444, 16444, 16444, 16444, 16444, 16444, 16444, 16444, 
16444, 16447, 16447, 16447, 16447, 16447, 16447, 16447, 16447, 
16447, 16447, 16447, 16447, 16447, 16447, 16447, 16447, 16447, 
16447, 16447, 16447, 16447, 16447, 16447, 16447, 16447, 16448, 
16448, 16448, 16448, 16448, 16448, 16442, 16442, 16442, 16442, 
16442, 16442, 16442, 16442, 16442, 16442, 16442, 16442, 16442, 
16442, 16442, 16442, 16442, 16442, 16442, 16442, 16442), class = "Date"), 
    test_name = c("Epidemic Typhus Group IgG Abs", "Epidemic Typhus Group IgM Abs", 
    "Spotted Fever Group IgG Abs", "Spotted Fever Group IgM Abs", 
    "Albumin", "Alkaline phosphatase", "Alanine transaminase", 
    "Basophils", "Bilirubin (total)", "Creatinine", "C-reactive protein", 
    "Eosinophils", "Estimated GFR", "Haemoglobin (g/L)", "HCT", 
    "Potassium", "Lymphocytes", "MCHC (g/L)", "MCH", "MCV", "Monocytes", 
    "MPV", "Sodium", "Neutrophils", "Platelet count", "Red cell count", 
    "RDW", "Urea", "White cell count", "Albumin", "Alkaline phosphatase", 
    "Alanine transaminase", "Basophils", "Bilirubin (total)", 
    "Creatinine", "C-reactive protein", "Eosinophils", "Estimated GFR", 
    "Haemoglobin (g/L)", "HCT", "Potassium", "Lymphocytes", "MCHC (g/L)", 
    "MCH", "MCV", "Monocytes", "MPV", "Sodium", "Neutrophils", 
    "Platelet count", "Red cell count", "RDW", "Urea", "White cell count", 
    "Creatinine", "C-reactive protein", "Estimated GFR", "Potassium", 
    "Sodium", "Urea", "Albumin", "Alkaline phosphatase", "Alanine transaminase", 
    "APTT Ratio", "APTT", "Basophils", "Bilirubin (total)", "Creatinine", 
    "C-reactive protein", "Eosinophils", "Fibrinogen", "Estimated GFR", 
    "Haemoglobin (g/L)", "HCT", "INR", "Potassium", "Lymphocytes", 
    "MCHC (g/L)", "MCH", "MCV", "Monocytes"), result = c("Not detected", 
    "Not detected", "Not detected", "Not detected", "47", "84", 
    "29", "0.3%  0.03", "12", "98", "3.3", "1.7%  0.15", "77\r\nUnits: mL/min/1.73sqm\r\nMultiply eGFR by 1.21 for people of African\r\nCaribbean origin. Interpret with regard to UK CKD\r\nguidelines: www.renal.org/information-resources\r\nUse with caution for adjusting drug dosages -\r\ncontact clinical pharmacist for advice.", 
    "156", "0.435", "3.8", "25.7%  2.31", "359", "30.4", "84.6", 
    "7.1%  0.64", "10.1", "140", "65.2%  5.86", "240", "5.14", 
    "12.4", "3.9", "8.99", "45", "53", "41", "0.3%  0.03", "10", 
    "59", "2.0", "2.8%  0.32", ">90\r\nUnits: mL/min/1.73sqm\r\nMultiply eGFR by 1.21 for people of African\r\nCaribbean origin. Interpret with regard to UK CKD\r\nguidelines: www.renal.org/information-resources\r\nUse with caution for adjusting drug dosages -\r\ncontact clinical pharmacist for advice.", 
    "126", "0.398", "4.5", "28.7%  3.30", "317", "25.7", "81.2", 
    "5.7%  0.65", "10.8", "143", "62.5%  7.18", "411", "4.90", 
    "14.7", "3.5", "11.49", "59", "76.2", ">90\r\nUnits: mL/min/1.73sqm\r\nMultiply eGFR by 1.21 for people of African\r\nCaribbean origin. Interpret with regard to UK CKD\r\nguidelines: www.renal.org/information-resources\r\nUse with caution for adjusting drug dosages -\r\ncontact clinical pharmacist for advice.", 
    "4.2", "139", "3.4", "46", "47", "40", "1.3", "39", "0.4%  0.01", 
    "8", "74", "7.0", "0.4%  0.01", "2.50", ">90\r\nUnits: mL/min/1.73sqm\r\nMultiply eGFR by 1.21 for people of African\r\nCaribbean origin. Interpret with regard to UK CKD\r\nguidelines: www.renal.org/information-resources\r\nUse with caution for adjusting drug dosages -\r\ncontact clinical pharmacist for advice.", 
    "146", "0.441", "0.96", "4.3", "43.2%  1.14", "331", "29.1", 
    "87.8", "6.8%  0.18")), class = "data.frame", row.names = c(NA, 
-81L))

This is the pivot_wider() code I have used (having called the dataset above path_results:

library(tidyr)

path_results_wide <- path_results %>%
  select(row_id, sample_date, test_name, result)%>%
  pivot_wider(
    id_cols = c(row_id,
                sample_date), 
    names_from = test_name, 
    values_from = result
  )

Some columns should be numerical, and some should be character strings, but pivot_wider() has parsed them all as lists of characters and when I try and change them to numerical, I get the following error:

path_results_wide$Albumin <- as.numeric(path_results_wide$Albumin)

Error: Can't cast <list_of<character>> to <double>

Any suggestions for what I can do to fix this would be very welcome. Thanks.

pedrostrusso · Answer 1 · 2019-04-11T12:01:45.190

0

Old answer:

Not sure about pivot_wider, but if I undestood correctly, I think this might be what you want, using reshape2. There are several NA values since each row is a patient and a date, on which a specific test was performed.

library(reshape2)
res <- dcast(path_results, row_id + sample_date ~ test_name)

New answer:

Reading about the dcast function in another issue, I realized we needed to add another id column to uniquely identify each separate row. I then came across the following while reading about the spread function in dplyr:

path_results_wide <- path_results %>%
    rowid_to_column() %>%
    spread(test_name, result)

edited Apr 11 '19 at 12:01

answered Apr 10 '19 at 12:08

pedrostrusso

388
3
10

Thank you. This definitely looks like what I would need, and certainly when I try it on the sample of data that I posted above, it works fine. When I try it on my full dataset (~500 000 rows that result in approx. 1000 different columns from the test_names, the columns have all been parsed as integers (presumably relating to the number of times a result exists in that specific cell?). Any idea how to overcome this? – tommy Apr 10 '19 at 12:51
Do you mean the column names have been parsed as integers? Or the values in the columns? Maybe check if some of the columns in your full dataset are factor types? – pedrostrusso Apr 10 '19 at 13:34
Thanks. I don't think so. The columns in the full dataset are all character, apart from sample_date. The issue seems to be when cast can't variables do not identify a single observation for each output cell – tommy Apr 10 '19 at 21:02
ie. when there are multiple measurements (result) for a given test (test_name) for a patient (row_id) on the same date (sample_date). I think when it can't identify a single observation for each output cell, it invokes fun.aggregate and returns a summary statistic instead of the actual result. Ideally, I would have dcast create an additional row for the additional values. Not sure how best to overcome this. I've tried various permutations of fun.aggregate but without success. – tommy Apr 10 '19 at 21:14

Error: Can't cast > to after using new pivot_wider() function in tidyr

1 Answers1