How do I change the way my data is organized?

Question

My data is currently organized in the following method (see first table below for actual data). I am only showing a portion of the overall data as the full image is quite large (over 100 rows).

Row   September        October         November        December      January        February       March        April       May            June          July       
1     Chino Hills      Huntington Bea~ Fountain Valley Anaheim       Fountain Vall~ Arcadia        Anaheim      Newport Be~ Santa Ana      NA            NA         
2     Irvine           Cerritos        Long Beach      Chino Hills   Cerritos       Anaheim        NA           Banning     Newport Beach  Anaheim       NA         
3     Glendale         NA              West Covina     Monterey Park Encino         NA             Monterey Pa~ NA          Los Angeles    Cerritos      Beverly Hi~
4     Norco            Fountain Valley NA              Monterey Park NA             Long Beach     NA           Santa Ana   Huntington Be~ Fountain Val~ NA         
5     Los Angeles      Inglewood       West Covina     Glendale      NA             Glendale       NA           Granada Hi~ Chino          West Covina   Tarzana

I want to change the way it is organized so that it shows the following. I want to emphasize that it would show all of the cities, not just the ones I have chosen to list. This is an incomplete diagram, but it gets the idea across:

+-------------+------------------+--------+----------+
| Chino Hills | Huntington Beach | Irvine | Glendale |
+-------------+------------------+--------+----------+
| Row 1       | Row 1            | Row 2  | Row 3    |
| Row 2       |                  |        | Row 5    |
|             |                  |        | Row 5    |
+-------------+------------------+--------+----------+

I have tried tidyr::separate_rows(dfl, col), but this only works if the cities are in one cell; however, they are in multiple cells in multiple rows. This is what happens when I try the tidyr::separate_rows(dfl, col):

Row   September        October         November        December      January        February       March        April       May            June          July       
   <chr> <chr>            <chr>           <chr>           <chr>         <chr>          <chr>          <chr>        <chr>       <chr>          <chr>         <chr>      
 1 1     Chino Hills      Huntington Bea~ Fountain Valley Anaheim       Fountain Vall~ Arcadia        Anaheim      Newport Be~ Santa Ana      NA            NA         
 2 2     Irvine           Cerritos        Long Beach      Chino Hills   Cerritos       Anaheim        NA           Banning     Newport Beach  Anaheim       NA         
 3 3     Glendale         NA              West Covina     Monterey Park Encino         NA             Monterey Pa~ NA          Los Angeles    Cerritos      Beverly Hi~
 4 4     Norco            Fountain Valley NA              Monterey Park NA             Long Beach     NA           Santa Ana   Huntington Be~ Fountain Val~ NA         
 5 5     Los Angeles      Inglewood       West Covina     Glendale      NA             Glendale       NA           Granada Hi~ Chino          West Covina   Tarzana

As you can see, the only thing it does is add in another row of numbers which I do not need.

In summary, I need the Program R to find all of the cities and tell me what row they are in. The row may appear more than once if the city is in that row more than once. It will organize more than one column, not just the standard one column as used in tidyr. The number of columns will depend on the number of different cities.

It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. (Images of data aren't helpful because we can't copy/paste the data). — MrFlick, Jul 09 '20 at 22:14
I have edited the original post with the data that I am currently using (not in image form). I have also added an example of when I use the separate_rows function. Is there any more data which I can provide which would be useful? I'm not sure what `melt` from `reshape` means. — Brittney, Jul 09 '20 at 22:21

score 1 · Answer 1 · answered Jul 10 '20 at 02:54

We can get the data in long format, keep only unique values for each Row and value and get data in wide format. Assuming df is the dataframe name.

library(dplyr)
library(tidyr)

df %>%
   pivot_longer(cols = -Row, values_drop_na = TRUE) %>%
   distinct(Row, value) %>%
   group_by(value) %>%
   mutate(row = row_number()) %>%
   pivot_wider(names_from = value, values_from = Row)

How do I change the way my data is organized?

1 Answers1