0

I'm working in R and have a data frame with test proficiency results by school and subject. Initially, there was a separate row for each test result, and a column telling what content area was tested. I want a single row per school, with a column for the proficiency % of each content area. Here is a sample of my data

structure(list(pct_proficient = c(NA, 55.36, 34.98, NA, NA, 50, 
34.72, NA, NA, NA), school_year = c(2021, 2021, 2021, 2021, 2021, 
2021, 2021, 2021, 2021, 2021), school_code = c(610, 610, 610, 
611, 612, 612, 612, 615, 615, 615), content_area = c("CMP", "ELA", 
"MATH", "CMP", "CMP", "ELA", "MATH", "ELA", "MATH", "ELA"), organization = c("Allen Frear Elementary School", 
"Allen Frear Elementary School", "Allen Frear Elementary School", 
"J. Ralph McIlvaine Early Childhood Center", "Major George S. Welch Elementary School", 
"Major George S. Welch Elementary School", "Major George S. Welch Elementary School", 
"Kent Elementary Intensive Learning Center", "Kent Elementary Intensive Learning Center", 
"Kent Elementary Intensive Learning Center")), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

I used:

perf_wide <- Performance_by_school %>% 
  pivot_wider(names_from = content_area, values_from = pct_proficient)

This resulted in a data frame with the structure I wanted, but each entry is now a list including the proficiency and the value NA. For example, one entry in the column MATH reads c(42.1, NA). How can I get rid of the NAs and get a single value for each entry?

M--
  • 25,431
  • 8
  • 61
  • 93
Dee
  • 3
  • 3
  • Welcome! If you add data to your question to [make your question reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), you're more likely to get an answer! – alistaire Apr 29 '23 at 23:19
  • Thank you, I would be glad to add my data, but I don't know how. – Dee Apr 29 '23 at 23:24
  • The link has lots of suggestions about that! A quick way if it's reasonably sized is to call `dput(Performance_by_school)` and edit with the results of that, which if you run them will reproduce your data.frame. – alistaire Apr 29 '23 at 23:25

1 Answers1

1

The data contains duplicated entries. school_code 615 has 2 "ELA" entries. Assuming this is an error, if you change

Performance_by_school[8,4] <- "CMP"

and then pivot_wider

library(dplyr)
library(tidyr)

Performance_by_school %>% 
  pivot_wider(names_from = content_area, values_from = pct_proficient)
# A tibble: 4 × 6
  school_year school_code organization                           CMP   ELA  MATH
        <dbl>       <dbl> <chr>                                <dbl> <dbl> <dbl>
1        2021         610 Allen Frear Elementary School           NA  55.4  35.0
2        2021         611 J. Ralph McIlvaine Early Childhood …    NA  NA    NA  
3        2021         612 Major George S. Welch Elementary Sc…    NA  50    34.7
4        2021         615 Kent Elementary Intensive Learning …    NA  NA    NA

it may be your expected result.

Andre Wildberg
  • 12,344
  • 3
  • 12
  • 29
  • I wouldn't assume that data is erroneous. – M-- Apr 30 '23 at 02:16
  • 1
    This was the problem. I didn't realize that there were still multiple tests for each content type, so I had to go back and use names from= the actual test name. Thanks! – Dee Apr 30 '23 at 04:20