-1

I am starting with 6 different lists/csvs that each contain one charater column. This column shows the Home Owner Loan Corporation (HOLC) neighborhood grades of census block groups. So the columns look something like shown below. I am new to using R studio and I am wondering if the first step would be to combine the lists. Another option could be to add a new binary column to each list that identifies a column as a 1 if it is not NA and 0 if it is. Then each of the lists can can condense into the categories A, B, C, D, and NA and the new binary column can be summed.

Ideally I am interested in using ggplot but I am open to other options. Thanks for your help! I appreciate it.

Example image of how I would like the results to look. In this example, each line represents a different list/csv table: Example image of how I would like the results to look. In this example, each line represents a different list/csv table

houston_grade2020
NA
NA
B
A
NA
C
D
minneapolis_grade2020
A
NA
NA
B
C
C
D
houston_grade1990
B
B
B
A
A
C
D
minneapolis_grade1990
B
A
NA
A
NA
NA
D

etc.

(I started by working with one csv to try and visualize it but alas it did not work. In this example, I did not add the binary column.)

# Group by Grade
Houston_2020_group <- 
  data.frame(
    values = c(Houston_2020_sub$houston_grade2020),
    group = c(rep("Houston 2020", nrow(Houston_2020_sub)))
  )

ggplot(data = Houston_2020_group, aes(x = values, y = group, fill = group)) +
  geom_line()+
  lab(title="HOLC Grades")

results: results

In this example, I failed to sum the count of the appearances of each grade. For the final result I would like all lists/csvs to be represented in the graph.

wibeasley
  • 5,000
  • 3
  • 34
  • 62
Kimberly
  • 5
  • 2
  • 1
    Welcome to Stack Overflow! Can you please read and incorporate elements from [How to make a great R reproducible example?](https://stackoverflow.com/q/5963269/1082435). Especially the aspects of using `dput()` for the input and then an explicit example of your expected dataset? I've formatted the code so it more easily understood. See if there is anything you'd like to further improve. The big problem is that `Houston_2020_sub` is not defined, so the problem isn't reproducible – wibeasley Jan 06 '23 at 03:46
  • It’s also not clear what you want the y axis to represent. In your code, you specify `y = `group`, but this doesn’t really make sense, especially with a like graph — it should presumably be a numeric variable. – zephryl Jan 06 '23 at 04:08

1 Answers1

1

Your biggest challenge here is rearranging your data into an appropriate format for plotting. Essentially, you should get all your data in a single data frame, with all the grades in a single column, and have a second column indicating which data set the grades came from. Then you can group the data according to this second column and count the number of each grade. This then allows easy plotting:

library(tidyverse)

list(Houston_2020 = Houston_2020_sub, 
     Minneapolis_2020 = Minneapolis_2020_sub,
     Houston_1990 = Houston_1990_sub, 
     Minneapolis_1990 = Minneapolis_1990_sub) %>%
  lapply(function(x) setNames(x, 'grade')) %>%
  {do.call(bind_rows, c(., .id = 'group'))} %>%
  mutate(grade = factor(grade)) %>%
  group_by(group) %>%
  count(grade, .drop = FALSE) %>%
  ggplot(aes(grade, n, colour = group, group = group)) +
  geom_line() +
  geom_point(color = 'black') +
  facet_grid(group~.)

enter image description here

If you want all the lines on the same panel, just get rid of that final facet_grid line. It looks messy without this at present because your numbers are so small.


Data in reproducible format, taken from question

Houston_2020_sub <- data.frame(houston_grade2020 = c(NA, NA, 'B', 'A', 
                                                     NA, 'C', 'D'))

Minneapolis_2020_sub <- data.frame(minneapolis_grade2020 = c('A', NA, NA, "B", 
                                                             "C", "C", "D"))

Houston_1990_sub <- data.frame(houston_grade1990 = c('B', 'B', 'B', 'A', 'A', 
                                                     'C', 'D'))

Minneapolis_1990_sub <- data.frame(minneapolis_grade1990 = c('B', 'A', NA, 'A',
                                                             NA, NA, 'D'))
Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • Thanks for your help with this, Allan. Sorry for my delayed reply, I was in the process of moving across the country but I appreciate your patience in working through my question. I've taken notes on how to make reproducible data formats and will use this in the future. Cheers~ – Kimberly Jan 26 '23 at 02:08