0

This is hard to explain but basically I have a very simple dataframe with Counties and Cases

dat <- "County   Cases
1       Borden   5
2       Bosque   3
3       Bowue    1"

and I have a large dataframe from TEX <- map_data('county', 'texas').

> head(TEX)
       long      lat group order region subregion
1 -95.75271 31.53560     1     1  texas  anderson
2 -95.76989 31.55852     1     2  texas  anderson
3 -95.76416 31.58143     1     3  texas  anderson
4 -95.72979 31.58143     1     4  texas  anderson
5 -95.74698 31.61008     1     5  texas  anderson
6 -95.72405 31.63873     1     6  texas  anderson

What I want to do is check every row and if the subregion is in the dataframe dat then add the corresponding number of cases to a new column in TEX called "cases" or add 0 if not.

For example

> head(TEX)
       long      lat group order region subregion cases
1 -95.75271 31.53560     1     1  texas  anderson 0
2 -95.76989 31.55852     1     2  texas  anderson 0
3 -95.76416 31.58143     1     3  texas  anderson 0
4 -95.72979 31.58143     1     4  texas  anderson 0
5 -95.74698 31.61008     1     5  texas  Borden   5
6 -95.72405 31.63873     1     6  texas  Bosque   3

I tried doing it with this bit of code

for (val in counties$counties) {
     for (vall in TEX$subregion) {
         if (val == vall) TEX$cases = counties$cases
     }
}

but I get this error

Error in `$<-.data.frame`(`*tmp*`, "cases", value = c(5L, 3L, 2L, 1L,  : 
  replacement has 10 rows, data has 4488

My end goal here is to be able to create a choropleth of texas counties that have COVID cases based on my growing list of Counties and Cases. If you have a better method of doing this than I would be open to that!

Regards!

UPDATE: Ian's solution worked great but it is causing a problem with ggplot and mapping. If I take a section of the dataframe TEX before merge it looks like this

6   -96.81268   28.28693    4   76  texas   aransas
77  -96.80695   28.25828    4   77  texas   aransas
78  -96.82414   28.21817    4   78  texas   aransas
79  -96.87570   28.19525    4   79  texas   aransas
80  -96.91009   28.16660    4   80  texas   aransas
81  -96.94446   28.14942    4   81  texas   aransas
82  -96.94446   28.18379    4   82  texas   aransas
83  -96.92727   28.24109    4   83  texas   aransas
84  -96.92154   28.26974    4   84  texas   aransas
85  -96.94446   28.27547    4   85  texas   aransas
86  -96.99030   28.25255    4   86  texas   aransas
87  -96.98457   28.23536    4   87  texas   aransas
88  -96.97311   28.21817    4   88  texas   aransas
89  -96.96165   28.19525    4   89  texas   aransas
90  -96.97311   28.17233    4   90  texas   aransas
91  -97.00175   28.15515    4   91  texas   aransas
92  -97.03613   28.15515    4   92  texas   aransas
93  -97.04186   28.17233    4   93  texas   aransas
94  -97.03613   28.20098    4   94  texas   aransas
95  -97.05905   28.21817    4   95  texas   aransas
96  -97.07624   28.20671    4   96  texas   aransas
97  -97.11062   28.21817    4   97  texas   aransas
98  -97.12780   28.23536    4   98  texas   aransas
99  -97.12780   28.25255    4   99  texas   aransas
100 -97.11062   28.26401    4   100 texas   aransas
101 -97.01894   28.27547    4   101 texas   aransas
102 -96.80122   28.31557    4   102 texas   aransas

and after plotting

ggplot(TEX, aes(long,lat, group = group)) + geom_polygon(aes(fill = subregion),color = "black") + theme(legend.position = "none") + coord_quickmap()

what

Looks great! Now when I execute the merge function TEX gets rearranged

72  aransas -97.00175   28.15515    4   91  texas   1
73  aransas -97.04186   28.17233    4   93  texas   1
74  aransas -96.80695   28.25828    4   77  texas   1
75  aransas -96.80122   28.31557    4   102 texas   1
76  aransas -97.03613   28.15515    4   92  texas   1
77  aransas -96.81268   28.28693    4   76  texas   1
78  aransas -97.12780   28.25255    4   99  texas   1
79  aransas -97.11062   28.26401    4   100 texas   1
80  aransas -96.97311   28.17233    4   90  texas   1
81  aransas -97.12780   28.23536    4   98  texas   1
82  aransas -97.07624   28.20671    4   96  texas   1
83  aransas -96.94446   28.27547    4   85  texas   1
84  aransas -97.01894   28.27547    4   101 texas   1
85  aransas -96.96165   28.19525    4   89  texas   1
86  aransas -97.11062   28.21817    4   97  texas   1
87  aransas -96.87570   28.19525    4   79  texas   1
88  aransas -97.03613   28.20098    4   94  texas   1
89  aransas -97.05905   28.21817    4   95  texas   1
90  aransas -96.97311   28.21817    4   88  texas   1
91  aransas -96.92154   28.26974    4   84  texas   1
92  aransas -96.99030   28.25255    4   86  texas   1
93  aransas -96.98457   28.23536    4   87  texas   1
94  aransas -96.82414   28.21817    4   78  texas   1
95  aransas -96.80122   28.31557    4   75  texas   1
96  aransas -96.94446   28.14942    4   81  texas   1
97  aransas -96.91009   28.16660    4   80  texas   1
98  aransas -96.92727   28.24109    4   83  texas   1
99  aransas -96.94446   28.18379    4   82  texas   1

and now the map looks like this... enter image description here

What can I do to save the original order of TEX? or wait maybe I just need to sort by order....

UPDATE#2

TEX <- TEX[order(TEX$order),]

solved the problem. I am curious why merge changed the order like that

1 Answers1

0

We can use merge from base R.

result <- merge(TEX,dat,by.x="subregion",by.y="County",all.x=TRUE)
result
  subregion      long      lat group order region Cases
1  anderson -95.75271 31.53560     1     1  texas    NA
2  anderson -95.76989 31.55852     1     2  texas    NA
3  anderson -95.76416 31.58143     1     3  texas    NA
4  anderson -95.72979 31.58143     1     4  texas    NA
5  anderson -95.74698 31.61008     1     5  texas    NA
6  anderson -95.72405 31.63873     1     6  texas    NA
7    Borden -95.74698 31.61008     1     5  texas     5
8    Bosque -95.72405 31.63873     1     6  texas     3

Then we can replace the NAs with 0.

result$Cases[is.na(result$Cases)] <- 0
result
  subregion      long      lat group order region Cases
1  anderson -95.75271 31.53560     1     1  texas     0
2  anderson -95.76989 31.55852     1     2  texas     0
3  anderson -95.76416 31.58143     1     3  texas     0
4  anderson -95.72979 31.58143     1     4  texas     0
5  anderson -95.74698 31.61008     1     5  texas     0
6  anderson -95.72405 31.63873     1     6  texas     0
7    Borden -95.74698 31.61008     1     5  texas     5
8    Bosque -95.72405 31.63873     1     6  texas     3

Data

TEX <- structure(list(long = c(-95.75271, -95.76989, -95.76416, -95.72979, 
-95.74698, -95.72405, -95.74698, -95.72405), lat = c(31.5356, 
31.55852, 31.58143, 31.58143, 31.61008, 31.63873, 31.61008, 31.63873
), group = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), order = c(1L, 2L, 
3L, 4L, 5L, 6L, 5L, 6L), region = structure(c(1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L), .Label = "texas", class = "factor"), subregion = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 2L, 3L), .Label = c("anderson", "Borden", 
"Bosque"), class = "factor")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8"))

dat <- structure(list(County = structure(1:3, .Label = c("Borden", "Bosque", 
"Bowue"), class = "factor"), Cases = c(5L, 3L, 1L)), class = "data.frame", row.names = c("1", 
"2", "3"))
Ian Campbell
  • 23,484
  • 14
  • 36
  • 57
  • I can't thank you enough, I know the comments are for more than just "thank you" but seriously, this website is so much help!!!! I kept looking for a function that would do what I wanted, who would've known it would be so simple. UPDATE - see my update above – Cannon Woodbury Mar 27 '20 at 01:16