1

I am trying to merge 2 data frames.

The main dataset, df1, contains numerical data in wide format - each row represents a date, each column contains the value for that date in a given city.

df2 contains metadata for each city: latitude, longitude, and elevation.

What I wish to do is add the metadata for each city to df1, but I was unsuccessful in doing so as the data frames don't match up in structure.

df1

         Date Machrihanish High_Wycombe Camborne Dun_Fell Plymouth
        <dbl>        <dbl>        <dbl>    <dbl>    <dbl>    <dbl>
   1 20200101          8.5          6.9      9.6      3.3      9.9
   2 20200102         11.7          9.1     11.2      5       10.9
   3 20200103          9.1          9.9     11.2      5.1     11.1
   4 20200104          9.2          8.1      9.4      2.2      9.4
   5 20200105         11.7          7.6      9        4.3      9.3
   6 20200106         10.8          8       11.6      3.7     10.6
   7 20200107         14.7         11.7     12        6.7     11.5
   8 20200108         11.2         11.8     11.6      6.2     11.3
   9 20200109          7           12       11.6     -0.2     11.5
  10 20200110          9.3          7.4     10        0       10.1

df2

   Location     Longitude Latitude Elevation
   <chr>            <dbl>    <dbl>     <dbl>
 1 Machrihanish    -5.70      55.4        10
 2 High_Wycombe    -0.807     51.7       204
 3 Camborne        -5.33      50.2        87
 4 Dun_Fell        -2.45      54.7       847
 5 Plymouth        -4.12      50.4        50
Andrea M
  • 2,314
  • 1
  • 9
  • 27
Joe
  • 795
  • 1
  • 11
  • Can you edit your question to add a sample of what the table you want would look like? – Andrea M Aug 07 '22 at 21:15
  • 2
    It's certainly possible, you will need to reshape your df1 though so that you have a column that matches `Location`. I.e., make df1 a 'long' file with three columns - `Date`, `Location` and `Temp` (assuming it is a temperature measure - see here: https://stackoverflow.com/a/57939670/496803 for how to do that step. – thelatemail Aug 07 '22 at 21:16
  • @thelatemail Just had a go at this following the link but still unsuccessful, I'm struggling with how to convert it because I have multiple cities – Joe Aug 07 '22 at 21:36
  • @AndreaM I'm looking for a table of date, longitude, latitude, elevation, city and temperature – Joe Aug 07 '22 at 21:38

1 Answers1

3

Here is a solution that tidies the data to long format by location and day, and merges the lat / long information.

Using data provided in the original post, we read it into two data frames.

tempText <- "rowId Date Machrihanish High_Wycombe Camborne Dun_Fell Plymouth
     1 20200101          8.5          6.9      9.6      3.3      9.9
2 20200102         11.7          9.1     11.2      5       10.9
3 20200103          9.1          9.9     11.2      5.1     11.1
4 20200104          9.2          8.1      9.4      2.2      9.4
5 20200105         11.7          7.6      9        4.3      9.3
6 20200106         10.8          8       11.6      3.7     10.6
7 20200107         14.7         11.7     12        6.7     11.5
8 20200108         11.2         11.8     11.6      6.2     11.3
9 20200109          7           12       11.6     -0.2     11.5
10 20200110          9.3          7.4     10        0       10.1"

library(tidyr)
library(dplyr)

temps <- read.table(text = tempText,header = TRUE)

latLongs <-"rowId   Location     Longitude Latitude Elevation
 1 Machrihanish    -5.70      55.4        10
 2 High_Wycombe    -0.807     51.7       204
 3 Camborne        -5.33      50.2        87
 4 Dun_Fell        -2.45      54.7       847
 5 Plymouth        -4.12      50.4        50"

latLongs <- read.table(text = latLongs,header = TRUE)

Next, we use tidyr::pivot_longer() to generate long format data, and then merge it with the lat long data via dplyr::full_join(). Note that we set the name of the column where the wide format column names are stored with names_to = "Location" so that full_join() uses Location to join the two data frames.

temps %>%
     select(-rowId) %>%
     pivot_longer(.,Machrihanish:Plymouth,names_to = "Location",  values_to="MaxTemp") %>%
     full_join(.,latLongs) %>% select(-rowId) -> joinedData

head(joinedData)

...and the first few rows of joined output looks like this:

> head(joinedData)
# A tibble: 6 × 6
      Date Location     MaxTemp Longitude Latitude Elevation
     <int> <chr>          <dbl>     <dbl>    <dbl>     <int>
1 20200101 Machrihanish     8.5    -5.7       55.4        10
2 20200101 High_Wycombe     6.9    -0.807     51.7       204
3 20200101 Camborne         9.6    -5.33      50.2        87
4 20200101 Dun_Fell         3.3    -2.45      54.7       847
5 20200101 Plymouth         9.9    -4.12      50.4        50
6 20200102 Machrihanish    11.7    -5.7       55.4        10
>
Len Greski
  • 10,505
  • 2
  • 22
  • 33