I have a large long dataset of pathology results. Each patient has a unique identifier (in this case row_id
. For each patient they have had samples taken on a particular date (sample_date
). The range of tests that they have had is very varied and with a heterogeneous output (some with character strings, and some numeric). Also, not every patient has had every test performed at each sample_date
, so there should be quite a few NAs.
The name of the test performed is in the test_name
column, and the result is in the result
column.
I would like to make this into a wide dataset, spreading the result
column using the test_name
as the column headers, but keeping the identifiers as both row_id
and sample_date
.
The new pivot_wider()
function in tidyr seems to be perfect for my needs, and when I run it, it gives me the type of data frame I need (i.e., rows are still identified by row_id
and sample_date but now there are columns for each test_name and results therein.
Here is a small sample of my dataset:
structure(list(row_id = 1:81, sample_date = structure(c(16444,
16444, 16444, 16444, 16444, 16444, 16444, 16444, 16444, 16444,
16444, 16444, 16444, 16444, 16444, 16444, 16444, 16444, 16444,
16444, 16444, 16444, 16444, 16444, 16444, 16444, 16444, 16444,
16444, 16447, 16447, 16447, 16447, 16447, 16447, 16447, 16447,
16447, 16447, 16447, 16447, 16447, 16447, 16447, 16447, 16447,
16447, 16447, 16447, 16447, 16447, 16447, 16447, 16447, 16448,
16448, 16448, 16448, 16448, 16448, 16442, 16442, 16442, 16442,
16442, 16442, 16442, 16442, 16442, 16442, 16442, 16442, 16442,
16442, 16442, 16442, 16442, 16442, 16442, 16442, 16442), class = "Date"),
test_name = c("Epidemic Typhus Group IgG Abs", "Epidemic Typhus Group IgM Abs",
"Spotted Fever Group IgG Abs", "Spotted Fever Group IgM Abs",
"Albumin", "Alkaline phosphatase", "Alanine transaminase",
"Basophils", "Bilirubin (total)", "Creatinine", "C-reactive protein",
"Eosinophils", "Estimated GFR", "Haemoglobin (g/L)", "HCT",
"Potassium", "Lymphocytes", "MCHC (g/L)", "MCH", "MCV", "Monocytes",
"MPV", "Sodium", "Neutrophils", "Platelet count", "Red cell count",
"RDW", "Urea", "White cell count", "Albumin", "Alkaline phosphatase",
"Alanine transaminase", "Basophils", "Bilirubin (total)",
"Creatinine", "C-reactive protein", "Eosinophils", "Estimated GFR",
"Haemoglobin (g/L)", "HCT", "Potassium", "Lymphocytes", "MCHC (g/L)",
"MCH", "MCV", "Monocytes", "MPV", "Sodium", "Neutrophils",
"Platelet count", "Red cell count", "RDW", "Urea", "White cell count",
"Creatinine", "C-reactive protein", "Estimated GFR", "Potassium",
"Sodium", "Urea", "Albumin", "Alkaline phosphatase", "Alanine transaminase",
"APTT Ratio", "APTT", "Basophils", "Bilirubin (total)", "Creatinine",
"C-reactive protein", "Eosinophils", "Fibrinogen", "Estimated GFR",
"Haemoglobin (g/L)", "HCT", "INR", "Potassium", "Lymphocytes",
"MCHC (g/L)", "MCH", "MCV", "Monocytes"), result = c("Not detected",
"Not detected", "Not detected", "Not detected", "47", "84",
"29", "0.3% 0.03", "12", "98", "3.3", "1.7% 0.15", "77\r\nUnits: mL/min/1.73sqm\r\nMultiply eGFR by 1.21 for people of African\r\nCaribbean origin. Interpret with regard to UK CKD\r\nguidelines: www.renal.org/information-resources\r\nUse with caution for adjusting drug dosages -\r\ncontact clinical pharmacist for advice.",
"156", "0.435", "3.8", "25.7% 2.31", "359", "30.4", "84.6",
"7.1% 0.64", "10.1", "140", "65.2% 5.86", "240", "5.14",
"12.4", "3.9", "8.99", "45", "53", "41", "0.3% 0.03", "10",
"59", "2.0", "2.8% 0.32", ">90\r\nUnits: mL/min/1.73sqm\r\nMultiply eGFR by 1.21 for people of African\r\nCaribbean origin. Interpret with regard to UK CKD\r\nguidelines: www.renal.org/information-resources\r\nUse with caution for adjusting drug dosages -\r\ncontact clinical pharmacist for advice.",
"126", "0.398", "4.5", "28.7% 3.30", "317", "25.7", "81.2",
"5.7% 0.65", "10.8", "143", "62.5% 7.18", "411", "4.90",
"14.7", "3.5", "11.49", "59", "76.2", ">90\r\nUnits: mL/min/1.73sqm\r\nMultiply eGFR by 1.21 for people of African\r\nCaribbean origin. Interpret with regard to UK CKD\r\nguidelines: www.renal.org/information-resources\r\nUse with caution for adjusting drug dosages -\r\ncontact clinical pharmacist for advice.",
"4.2", "139", "3.4", "46", "47", "40", "1.3", "39", "0.4% 0.01",
"8", "74", "7.0", "0.4% 0.01", "2.50", ">90\r\nUnits: mL/min/1.73sqm\r\nMultiply eGFR by 1.21 for people of African\r\nCaribbean origin. Interpret with regard to UK CKD\r\nguidelines: www.renal.org/information-resources\r\nUse with caution for adjusting drug dosages -\r\ncontact clinical pharmacist for advice.",
"146", "0.441", "0.96", "4.3", "43.2% 1.14", "331", "29.1",
"87.8", "6.8% 0.18")), class = "data.frame", row.names = c(NA,
-81L))
This is the pivot_wider()
code I have used (having called the dataset above path_results
:
library(tidyr)
path_results_wide <- path_results %>%
select(row_id, sample_date, test_name, result)%>%
pivot_wider(
id_cols = c(row_id,
sample_date),
names_from = test_name,
values_from = result
)
Some columns should be numerical, and some should be character strings, but pivot_wider()
has parsed them all as lists of characters and when I try and change them to numerical, I get the following error:
path_results_wide$Albumin <- as.numeric(path_results_wide$Albumin)
Error: Can't cast <list_of<character>> to <double>
Any suggestions for what I can do to fix this would be very welcome. Thanks.