0

I am trying to graph OECD productivity data. I am having difficulty converting the format of the data in excel to long format.

The data is at OECD Productivity Statistics.

Now I can get the data into R alright using the Tidyverse function read_excel(). The code (to the relevant directory on my computer is:

read_excel("data/Labour Productivity.xlsx", skip=4) -> df

And just to make this question reproducible, I am running dput(df), and copying and pasting the output at the bottom. It is quite a big file, so I have made a shortened version with only a few rows. And in summary format, df returns:

> df
# A tibble: 51 × 104
   Unit      ...2  Perce…¹ ...4  ...5  ...6  ...7  ...8  ...9  ...10 ...11 ...12
   <chr>     <chr> <chr>   <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
 1 Time      NA    1971    NA    1972  NA    1973  NA    1974  NA    1975  NA   
 2 Country   NA    NA      NA    NA    NA    NA    NA    NA    NA    NA    NA   
 3 Australia NA    NA      -0.3… NA    1.00… NA    2.53… NA    -0.2… NA    1.58…
 4 Austria   NA    NA      4.63… NA    5.60… NA    4.30… NA    3.76… NA    -9.9…
 5 Belgium   NA    NA      3.37… NA    4.86… NA    5.80… NA    3.87… NA    -1.5…
 6 Canada    NA    NA      2.63… NA    4.29… NA    5.53… NA    1.84… NA    -8.7…
 7 Chile     NA    NA      ..    NA    ..    NA    ..    NA    ..    NA    ..   
 8 Colombia  NA    NA      ..    NA    ..    NA    ..    NA    ..    NA    ..   
 9 Costa Ri… NA    NA      ..    NA    ..    NA    ..    NA    ..    NA    ..   
10 Czech Re… NA    NA      ..    NA    ..    NA    ..    NA    ..    NA    ..   
# … with 41 more rows, 92 more variables: ...13 <chr>, ...14 <chr>,
#   ...15 <chr>, ...16 <chr>, ...17 <chr>, ...18 <chr>, ...19 <chr>,
#   ...20 <chr>, ...21 <chr>, ...22 <chr>, ...23 <chr>, ...24 <chr>,
#   ...25 <chr>, ...26 <chr>, ...27 <chr>, ...28 <chr>, ...29 <chr>,
#   ...30 <chr>, ...31 <chr>, ...32 <chr>, ...33 <chr>, ...34 <chr>,
#   ...35 <chr>, ...36 <chr>, ...37 <chr>, ...38 <chr>, ...39 <chr>,
#   ...40 <chr>, ...41 <chr>, ...42 <chr>, ...43 <chr>, ...44 <chr>, …
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

Now I know that there are a number of discrete problems here, but part of my confusion is that I also don't know the order to tackle them in. But I see the problems as follow:

(1) The years of the observations are each a separate column in the data frame. I think I need to do something with pivot_longer(), but not sure what. There is a relevant prior question, but I think my problem is a bit more complicated.

(2) The first two rows of the data frame are Time and Country but I need to get R to recognise these as variable names.

(3) I think there is a problem in that the original Excel file had a blank column after country, just to make the Excel file look more readable. Edit to add: I can also see that after every year column, there is a blank column, again presumably to make the Excel spreadsheet more readable to humans.

structure(list(Time = c("Country", "Australia", "Austria", "Belgium", 
"Canada", "Chile", "Colombia", "Costa Rica", "Czech Republic", 
"Denmark", "Estonia"), ...2 = c(NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA), `1971` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA), ...4 = c(NA, "-0.30104700000000001", "4.6366800000000001", 
"3.3776099999999998", "2.6359849999999998", "..", "..", "..", 
"..", "2.4861309999999999", ".."), `1972` = c(NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA), ...6 = c(NA, "1.0053270000000001", 
"5.602487", "4.8653690000000003", "4.2919600000000004", "..", 
"..", "..", "..", "3.3256290000000002", ".."), `1973` = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), ...8 = c(NA, "2.5389390000000001", 
"4.3065910000000001", "5.8062930000000001", "5.5375030000000001", 
"..", "..", "..", "..", "3.4711240000000001", ".."), `1974` = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), ...10 = c(NA, "-0.25812800000000002", 
"3.7659280000000002", "3.8796110000000001", "1.848525", "..", 
"..", "..", "..", "-1.5731740000000001", ".."), `1975` = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), ...12 = c(NA, "1.583342", 
"-9.9372000000000002E-2", "-1.5989930000000001", "-8.7609999999999997E-3", 
"..", "..", "..", "..", "-1.7487699999999999", ".."), `1976` = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), ...14 = c(NA, "2.5579670000000001", 
"4.7566879999999996", "5.4804500000000003", "4.4945979999999999", 
"..", "2.3030330000000001", "..", "..", "5.6531650000000004", 
".."), `1977` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
    ...16 = c(NA, "-0.31790400000000002", "5.0396479999999997", 
    "0.51344100000000004", "2.234715", "..", "1.851397", "..", 
    "..", "1.5500069999999999", ".."), `1978` = c(NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA), ...18 = c(NA, "2.973557", 
    "-0.12983", "2.758216", "2.656504", "..", "6.0626879999999996", 
    "..", "..", "1.905913", ".."), `1979` = c(NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA), ...20 = c(NA, "1.824414", "5.536448", 
    "2.2682319999999998", "2.6898029999999999", "..", "3.0537489999999998", 
    "..", "..", "3.6052780000000002", ".."), `1980` = c(NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA), ...22 = c(NA, "1.9095899999999999", 
    "1.7313780000000001", "4.3733979999999999", "0.84660299999999999", 
    "..", "1.8115380000000001", "..", "..", "-0.59951699999999997", 
    ".."), `1981` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA), ...24 = c(NA, "2.5221849999999999", "-0.398621", "-0.35012199999999999", 
    "2.2057180000000001", "..", "0.166439", "..", "..", "-0.62733700000000003", 
    ".."), `1982` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA), ...26 = c(NA, "-3.9045529999999999", "1.9381219999999999", 
    "0.51338499999999998", "-4.332147", "..", "-1.141303", "..", 
    "..", "3.7656079999999998", ".."), `1983` = c(NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA), ...28 = c(NA, "3.1696420000000001", 
    "3.1396280000000001", "0.26101099999999999", "1.5921339999999999", 
    "..", "-0.54990099999999997", "..", "..", "2.6763189999999999", 
    ".."), `1984` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA), ...30 = c(NA, "4.0220190000000002", "5.7572999999999999E-2", 
    "2.4559850000000001", "4.9133490000000002", "..", "0.76458700000000002", 
    "..", "..", "4.2272800000000004", ".."), `1985` = c(NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA), ...32 = c(NA, "2.5906639999999999", 
    "2.4505789999999998", "1.631168", "3.7846959999999998", "..", 
    "4.4291020000000003", "..", "..", "3.9630700000000001", ".."
    ), `1986` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
    ...34 = c(NA, "1.0860209999999999", "2.236389", "1.802106", 
    "1.1346400000000001", "..", "3.6237189999999999", "..", "..", 
    "4.7607549999999996", ".."), `1987` = c(NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA), ...36 = c(NA, "4.1609569999999998", 
    "1.293045", "2.1926399999999999", "2.710105", "4.7629229999999998", 
    "3.4426580000000002", "..", "..", "0.11738700000000001", 
    ".."), `1988` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA), ...38 = c(NA, "2.189638", "3.1497470000000001", "4.3636990000000004", 
    "3.0650430000000002", "5.4987630000000003", "2.3777940000000002", 
    "..", "..", "-7.2072999999999998E-2", ".."), `1989` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), ...40 = c(NA, "1.8381369999999999", 
    "3.420115", "3.09443", "0.497585", "8.7238369999999996", 
    "1.4461360000000001", "..", "..", "0.60596499999999998", 
    ".."), `1990` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA), ...42 = c(NA, "-1.8465240000000001", "3.553534", "2.8476910000000002", 
    "-1.334144", "2.0032890000000001", "2.294238", "..", "..", 
    "1.3173060000000001", ".."), `1991` = c(NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA), ...44 = c(NA, "-0.84329799999999999", 
    "2.413986", "1.4463410000000001", "-3.2953920000000001", 
    "6.0133159999999997", "0.72181899999999999", "..", "-11.151457000000001", 
    "1.118215", ".."), `1992` = c(NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA), ...46 = c(NA, "2.8890009999999999", "0.97609199999999996", 
    "1.1163270000000001", "-0.28703099999999998", "10.279559000000001", 
    "2.2960250000000002", "6.8137210000000001", "-0.59453400000000001", 
    "1.6218159999999999", ".."), `1993` = c(NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA), ...48 = c(NA, "3.059914", "-0.29874200000000001", 
    "-1.3448290000000001", "1.538861", "5.1152829999999998", 
    "3.723141", "4.782921", "-6.2075999999999999E-2", "-0.33623700000000001", 
    ".."), `1994` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA), ...50 = c(NA, 2.891148, 2.008749, 2.920842, 3.356227, 
    3.891242, 3.754363, 1.536693, 2.854003, 4.988503, 0.641706
    ), `1995` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
    ...52 = c(NA, 2.706651, 2.510913, 2.172655, 1.636809, 8.758298, 
    3.587604, 1.209846, 6.556514, 2.554803, 6.612745), `1996` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), ...54 = c(NA, 2.659026, 
    2.211448, 1.12194, 0.562468, 5.937484, -0.002071, -1.159916, 
    4.421137, 2.274329, 4.943014), `1997` = c(NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA), ...56 = c(NA, 3.487896, 1.977963, 
    3.548981, 3.248849, 5.935959, 1.678369, 2.788743, -0.405261, 
    2.811505, 13.050015), `1998` = c(NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA), ...58 = c(NA, 3.915087, 3.46783, 1.741955, 
    3.037476, 2.787537, -0.804672, 4.618787, -0.27237, 1.871199, 
    4.339826), `1999` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA), ...60 = c(NA, 2.770604, 3.355058, 3.309858, 4.311569, 
    -1.589784, -6.494719, 1.745047, 1.504032, 2.599767, 1.174938
    ), `2000` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
    ...62 = c(NA, 0.880442, 3.127424, 3.463737, 4.202683, 3.604273, 
    1.131633, 1.647388, 4.105116, 3.416459, 10.087634), `2001` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), ...64 = c(NA, 2.666875, 
    0.880259, 0.745814, 0.690161, 2.002076, 0.3875, 1.312721, 
    3.529872, 0.465557, 6.003674), `2002` = c(NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA), ...66 = c(NA, 1.945159, 1.150624, 
    1.25411, 1.903852, 2.062954, 1.216469, 1.607849, 1.80303, 
    0.111275, 6.771368), `2003` = c(NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA), ...68 = c(NA, 3.024949, 0.492309, 0.619263, 
    0.88803, 3.579306, 2.633097, 2.823142, 3.574784, 0.129305, 
    9.069629), `2004` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA), ...70 = c(NA, 2.056631, 2.091303, 3.123963, 2.129631, 
    5.543374, 4.047228, 2.768226, 4.760123, 2.421192, 6.804008
    ), `2005` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
    ...72 = c(NA, 1.497674, 1.549985, 1.755399, 2.234059, 4.581031, 
    3.443112, 2.33423, 6.318616, 2.034485, 9.526427), `2006` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), ...74 = c(NA, 2.386498, 
    2.920127, 1.881501, 1.602575, 5.12749, 5.451545, 5.65388, 
    6.428391, 3.568988, 9.765583), `2007` = c(NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA), ...76 = c(NA, 1.695572, 3.386779, 
    2.915842, 1.085367, 4.077356, 5.479383, 6.727566, 4.997186, 
    0.484163, 8.575315), `2008` = c(NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA), ...78 = c(NA, -0.1507, 1.139127, -0.34089, 
    -0.083592, 2.766311, 2.072618, 3.259895, 1.632905, -1.109705, 
    -5.132001), `2009` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA), ...80 = c(NA, 0.121603, -3.994648, -2.801237, -4.028003, 
    -2.077141, -0.039579, -2.367118, -5.218946, -5.423079, -14.629055
    ), `2010` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
    ...82 = c(NA, 0.810792, 1.598538, 1.920237, 1.950048, 5.007133, 
    3.283161, 3.586631, 2.184056, 1.43023, 2.444265), `2011` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), ...84 = c(NA, 2.468588, 
    2.585817, 0.38625, 2.141536, 5.05275, 5.712903, 3.42581, 
    1.959825, 0.918331, 8.344115), `2012` = c(NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA), ...86 = c(NA, 0.803192, 0.229073, 
    0.113395, 0.662001, 5.003127, 2.721417, 3.509618, -0.904089, 
    -0.149955, 3.2282), `2013` = c(NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA), ...88 = c(NA, 0.828866, -0.575305, -0.008889, 
    1.253614, 2.320638, 3.938339, 1.192745, -0.059531, 0.537736, 
    1.458429), `2014` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA), ...90 = c(NA, 0.640181, -0.124584, 1.125422, 1.841026, 
    0.787088, 3.321298, 2.233275, 2.125453, 1.079152, 3.011367
    ), `2015` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
    ...92 = c(NA, 1.262661, 0.012646, 1.453141, -0.089286, 1.107164, 
    1.8066, 2.385933, 5.206862, 1.640134, 0.246689), `2016` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), ...94 = c(NA, 0.697059, 
    0.70244, 0.75727, -0.135843, 0.656801, 0.955747, 2.960986, 
    2.320453, 2.39894, 3.155565), `2017` = c(NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA), ...96 = c(NA, 1.164278, 1.615994, 
    1.226502, 1.811279, -0.029263, 0.249102, 2.96067, 4.927968, 
    2.179658, 5.792045), `2018` = c(NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA), ...98 = c(NA, 0.614044, 1.931275, 1.329724, 
    0.992945, 2.147668, 1.456398, 1.464275, 2.86162, 1.479062, 
    3.784201), `2019` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA), ...100 = c(NA, -1.56003, 1.060782, 1.689119, 0.409, 
    -1.105991, 2.088999, 1.317209, 2.615667, 1.092189, 4.380305
    ), `2020` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
    ...102 = c(NA, 0.932659, -6.865297, -5.812281, -5.984636, 
    -7.6747, -8.01915, -5.052013, -5.775248, -2.213144, -0.551001
    ), `2021` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
    ...104 = c(NA, 3.437231, 4.151835, 5.712627, 3.969875, 10.4195, 
    9.537678, 6.667459, 3.571723, 4.425824, 8.013463)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -11L))
user438383
  • 5,716
  • 8
  • 28
  • 43
Lindsay
  • 131
  • 1
  • 10
  • Hi Lindsay. There are several files relating to productivity on the site you linked. Could you post a link to the exact file you are trying to parse? – Allan Cameron Dec 01 '22 at 12:03
  • Yes of course. It is https://www.oecd-ilibrary.org/employment/data/oecd-productivity-statistics/gdp-per-capita-and-productivity-growth_data-00685-en?parentId=http%3A%2F%2Finstance.metastore.ingenta.com%2Fcontent%2Fcollection%2Fpdtvy-data-en. I am editing my question to make that clear. – Lindsay Dec 01 '22 at 12:09
  • 1
    This seems like a problem with exporting it from the website - when I visit the site and click the "click to access data" link then go to "export > Text File (CSV)" it exports fine in long format and can be imported with `read.csv` – jpsmith Dec 01 '22 at 12:09
  • You can also load data directly into R using the [`OECD` package](https://github.com/expersso/OECD). E.g., `mydata <- get_dataset("PDB_GR", start_time = 2000, end_time = 2021, preformatted = TRUE)` (AFAIK, `PDB_GR` refers to the dataset you linked.) – jrcalabrese Dec 08 '22 at 03:15

0 Answers0