-3

I have GDP values listed by country (rows) and list of years (column headings) in one dataset. I'm trying to combine it with another dataset where the values represent GINI. How do I merge these two massive datasets by country and year, when "year" is not a variable? (How do I manipulate each dataset so that I introduce "year" as a column and have repeating countries to represent each year?

i.e. from the top dataframe to the bottom dataframe in the image?

  • 2
    Please do not provide your data as an image. No one wants to type it in again and we do not see the exact data structure that you have. Instead use `dput` to make a text version of your data that we can cut and paste into R. If your data is to long, try `dput(head(MyData,15))`. – G5W Dec 20 '17 at 18:14
  • `?dplyr::gather` – hrbrmstr Dec 20 '17 at 18:17
  • Sorry about that! First time posting on this forum, will do that from now on! – user9123976 Dec 20 '17 at 18:36
  • Are you looking for a SAS and R solution? Not sure if the SAS tag is actually applicable here when you're clearly talking about data frames and R terminology. – Reeza Dec 20 '17 at 20:59

3 Answers3

0

Reshape the top dataset from wide to long and then merge with your other dataset. There are many, many, examples of reshaping data on this site with different approaches. A common one is to use the tidyr package, which has a function called gather that does just what you need.

long_table <- tidyr::gather(wide_table, key = year, value = GDP, 1960:1962)

or whatever the last year in your dataset is. You can install the tidyr package with install.packages('tidyr') if you don't have it yet.

Next time, please avoid putting pictures of your data and provide reproducible data so this is easier for others to answer exactly. You can use dput(..) to do so.

dshkol
  • 1,208
  • 7
  • 23
0

Hope this helps!

#sample data (added 'X' before numeric year columns as R doesn't allow column name to start with digit)
df <- data.frame(Country_Name=c('Belgium','Benin'),
           X1960=c(123,234),
           X1961=c(567,890))

library(dplyr)
library(tidyr)

df_new <- df %>%
  gather(Year, GDP, -Country_Name)
df_new$Year <- gsub('X','',df_new$Year )
df_new

Output is:

  Country_Name Year GDP
1      Belgium 1960 123
2        Benin 1960 234
3      Belgium 1961 567
4        Benin 1961 890

(PS: As already suggested by others you should always share sample data using dput(df))

Prem
  • 11,775
  • 1
  • 19
  • 33
0

With the data in Excel, if you have Excel 2010 or later, you can use Power Query or Get & Transform to unpivot the "year" columns.

This is the code but you can do this through the GUI

enter image description here

And this is the result, although I had to format the GDP column to get your mix of Scientific and Number formatting, and I had a typo on Belgium 1962

enter image description here

Ron Rosenfeld
  • 53,870
  • 7
  • 28
  • 60