I am trying to reshape data but keep having errors. There are very helpful threads here on reshaping but I have not seen my particular issue. I am trying to go from long to wide - but have both country name and date as the id - typically I have seen solutions for making date the new columns.
#Data from:
library(wbstats)
WorldBank_long <- wb(indicator = c("NY.GDP.PCAP.KD", "SI.POV.GINI", "UNEMPSA_","CPTOTSAXMZGY" ),
startdate = 2005, enddate = 2019)
WorldBank <- WorldBank_long[,c("iso3c", "date", "value", "indicatorID")]
Columns in "WorldBank" are "iso3c" & "date" - which are both ID variables. "indicatorID" is the value I want to make new column names and "value" is the value I want for each new column.
I have tried:
library(tidyr)
WorldBank_data_wide<-WorldBank %>%
pivot_wider(names_from = indicatorID, values_from = value)
Error: Each row of output must be identified by a unique combination of keys. Keys are shared for 403 rows:
- 6086, 6101, 6116, 6131, 6146, 6176, 6191, 6206, 6221, 6236, 6251, 6266, 6296, 6431
- 6085, 6100, 6115, 6130, 6145, 6175, 6190, 6205, 6220, 6235, 6250, 6265, 6295, 6430
- 6084, 6099, 6114, 6129, 6144, 6174, 6189, 6204, 6219, 6234, 6249, 6264, 6294, 6429
- 6083, 6098, 6113, 6128, 6143, 6173, 6188, 6203, 6218, 6233, 6248, 6263, 6293, 6428
- 6082, 6097, 6112, 6127, 6142, 6172, 6187, 6202, 6217, 6232, 6247, 6262, 6292, 6427
- 6081, 6096, 6111, 6126, 6141, 6171, 6186, 6201, 6216, 6231, 6246, 6261, 6291, 6426
- 6080, 6095, 6110, 6125, 6140, 6170, 6185, 6200, 6215, 6230, 6245, 6260, 6290, 6425
- 6079, 6094, 6109, 6124, 6139, 6169, 6184, 6199, 6214, 6229, 6244, 6259, 6289, 6424
- 6078, 6093, 6108, 6123, 6138, 6168, 6183, 6198, 6213, 6228, 6243, 6258, 6288, 6423
- 6077, 6092, 6107, 6122, 6137, 6167, 6182, 6197, 6212, 6227, 6242, 6257, 6287, 6422
- 6076, 6091, 6106, 6121, 6136, 6166, 618
library(data.table)
WorldBank_data_wide = dcast(WorldBank_long, date + iso3c ~indicator, value.var = 'value')
Error: Aggregation function missing: defaulting to length --- The columns were what I wanted BUT instead of the values, the values were 0 or 1.
EDITS: Current Table
iso3c date value indicatorID
1 ARB 2019 6437.167 NY.GDP.PCAP.KD
2 ARB 2018 6465.474 NY.GDP.PCAP.KD
3 ARB 2017 6454.460 NY.GDP.PCAP.KD
4 ARB 2016 6506.271 NY.GDP.PCAP.KD
5 ARB 2015 6418.029 NY.GDP.PCAP.KD
6 ARB 2014 6350.838 NY.GDP.PCAP.KD
7 ARB 2013 6333.027 NY.GDP.PCAP.KD
8 ARB 2012 6274.111 NY.GDP.PCAP.KD
9 ARB 2011 6020.487 NY.GDP.PCAP.KD
10 ARB 2010 5934.291 NY.GDP.PCAP.KD
11 ARB 2009 5806.185 NY.GDP.PCAP.KD
What I want to do -- only instead of 0 and 1 to have the values from above (this is my output from (data.table):
date iso3c CPTOTSAXMZGY NY.GDP.PCAP.KD SI.POV.GINI UNEMPSA_
1 2005 ABW 0 1 0 0
2 2005 AFG 0 1 0 0
3 2005 AGO 0 1 0 0
4 2005 ALB 0 1 1 0
5 2005 AND 0 1 0 0
6 2005 ARB 0 1 0 0
7 2005 ARE 0 1 0 0
8 2005 ARG 0 1 1 1
9 2005 ARM 0 1 1 1
10 2005 ASM 0 1 0 0
11 2005 ATG 0 1 0 0
EDIT 2
The new command looks like it is better than all other output - but all my values (i.e. GDP per capita or Gini) now are repeated so that all other columns are NA except for the one that is being defined. But scrolling down then GDP becomes NA and Gini has its values.
#using new command:
WorldBank_data_wide<-WorldBank %>%
mutate(row = row_number()) %>% pivot_wider(names_from = indicatorID, values_from = value)
#output
iso3c date row. NY.GDP.PCAP.KDSI.POV.GINIUNEMPSA_CPTOTSAXMZGY
2138 BDI 2019 1126 208.0747 NA NA NA
2139 BDI 2018 1127 210.8042 NA NA NA
2140 BDI 2017 1128 214.1392 NA NA NA
2141 BDI 2016 1129 219.9615 NA NA NA
2152 BDI 2005 1140 221.0964 NA NA NA
2151 BDI 2006 1139 225.6222 NA NA NA
2150 BDI 2007 1138 225.8591 NA NA NA
2142 BDI 2015 1130 228.4325 NA NA NA
2149 BDI 2008 1137 229.1485 NA NA NA
2148 BDI 2009 1136 230.1926 NA NA NA
2592 ETH 2005 1580 233.9442 NA NA NA
Edit:
Another unforeseen issue - Some commands are working but instead of each row being a unique country and year - some are repeated so that there are multiple rows for Argentina in 2013 (for example)
World_bank_wide <- WorldBank %>%
group_by(iso3c,indicatorID) %>%
mutate(row_id=1:n()) %>% ungroup() %>%
spread(indicatorID, value) %>%
select(-row_id)
row iso3c date CPTOTSAXMZGY NY.GDP.PCAP.KD SI.POV.GINI UNEMPSA_
1 ABW 2005 NA 26979.8854 NA NA
104 ARE 2011 NA 34634.862 NA NA
105 ARE 2012 NA 35416.892 NA NA
106 ARE 2013 NA 36978.833 NA NA
107 ARE 2014 NA NA 32.5 NA
108 ARE 2014 NA 38495.046 NA NA
109 ARE 2015 NA 40247.747 NA NA
110 ARE 2016 NA 41045.111 NA NA
111 ARE 2017 NA 41460.283 NA NA