1

I just merged two datasets horizontally in R, but when I consequently try to plot two variables (one from the first and one from the second dataset), I get the message 'Aesthetics must be either length 1 or the same as the data (207): x, y'

TreismanData2<-read_dta("TreismanData.dta")
library(ggplot2)
TreismanData3<-merge(TreismanData2, QT_Row, by="wbcode")
attach(TreismanData3)
scatterGCBPI<-ggplot(TreismanData2, aes(cpi2007, GCB2013))
scatterGCBPI+geom_point(position = "jitter")+geom_smooth(method = 
lm)+labs(x="Perception", y="Experience")+theme_bw()

Does anyone know how to solve this?

Thanks a lot in advance!

TreismanData2:

structure(list(country = structure(c("United States", "Canada", 
"Bahamas, The", "Aruba", "Cuba", "Haiti", "Dominican Republic", 
"Jamaica", "Trinidad and Tobago", "Barbados", "Dominica", "Grenada", 
"St. Lucia", "St. Vincent and the Grenadines", "Antigua and Barbuda", 
"St. Kitts and Nevis", "Mexico", "Belize", "Guatemala", "Honduras", 
"El Salvador", "Nicaragua", "Costa Rica", "Panama", "Colombia", 
"Venezuela, RB", "Guyana", "Suriname", "Ecuador", "Peru", "Brazil", 
"Bolivia", "Paraguay", "Chile", "Argentina", "Uruguay", "United Kingdom", 
"Ireland", "Netherlands", "Belgium", "Luxembourg", "France", 
"Monaco", "Liechtenstein", "Switzerland", "Spain", "Andorra", 
"Portugal", "Germany", "Germany, East", "Poland", "Austria", 
"Hungary", "Czechoslovakia", "Czech Republic", "Slovak Republic", 
"Italy", "San Marino", "Malta", "Albania", "Montenegro", "Serbia", 
"Macedonia, FYR", "Croatia", "Yugoslavia", "Bosnia and Herzegovina", 
"Kosovo", "Serbia and Montenegro", "Slovenia", "Greece", "Cyprus", 
"Bulgaria", "Moldova", "Romania", "Russian Federation", "Estonia", 
"Latvia", "Lithuania", "Ukraine", "Belarus", "Armenia", "Georgia", 
"Azerbaijan", "Finland", "Sweden", "Norway", "Denmark", "Iceland", 
"Cape Verde", "Sao Tome and Principe", "Guinea-Bissau", "Equatorial Guinea", 
"Gambia, The", "Mali", "Senegal", "Benin", "Mauritania", "Niger", 
"Cote d'Ivoire", "Guinea", "Burkina Faso", "Liberia", "Sierra Leone", 
"Ghana", "Togo", "Cameroon", "Nigeria", "Gabon", "Central African Republic", 
"Chad", "Congo, Rep.", "Congo, Dem. Rep.", "Uganda", "Kenya", 
"Tanzania", "Burundi", "Rwanda", "Somalia", "Djibouti", "Ethiopia", 
"Eritrea", "Angola", "Mozambique", "Zambia", "Zimbabwe", "Malawi", 
"South Africa", "Namibia", "Lesotho", "Botswana", "Swaziland", 
"Madagascar", "Comoros", "Mauritius", "Seychelles", "Morocco", 
"Algeria", "Tunisia", "Libya", "Sudan", "Iran, Islamic Rep.", 
"Turkey", "Iraq", "Egypt, Arab Rep.", "Syrian Arab Republic", 
"Lebanon", "Jordan", "Israel", "Saudi Arabia", "Yemen (PDR)", 
"Yemen, Rep.", "Yemen, South", "Kuwait", "Bahrain", "Qatar", 
"United Arab Emirates", "Oman", "Afghanistan", "Turkmenistan", 
"Tajikistan", "Kyrgyz Republic", "Uzbekistan", "Kazakhstan", 
"China", "Macao SAR, China", "Mongolia", "Taiwan", "Korea, Dem. Rep.", 
"Korea, Rep.", "Japan", "India", "Bhutan", "Pakistan", "Bangladesh", 
"Myanmar", "Sri Lanka", "Maldives", "Nepal", "Thailand", "Cambodia", 
"Lao PDR", "Vietnam, North", "Vietnam, South", "Vietnam", "Malaysia", 
"Singapore", "Brunei Darussalam", "Philippines", "Indonesia", 
"Timor-Leste", "Australia", "Papua New Guinea", "New Zealand", 
"Vanuatu", "Solomon Islands", "Kiribati", "Tuvalu", "Fiji", "Tonga", 
"Nauru", "Marshall Islands", "Palau", "Micronesia, Fed. Sts.", 
"Samoa", "Hong Kong SAR, China", "Puerto Rico", "West Bank and Gaza"
), format.stata = "%46s"), wbcode = structure(c("USA", "CAN", 
"BHS", "ABW", "CUB", "HTI", "DOM", "JAM", "TTO", "BRB", "DMA", 
"GRD", "LCA", "VCT", "ATG", "K", "MEX", "BLZ", "GTM", "HND", 
"SLV", "NIC", "CRI", "PAN", "COL", "VEN", "GUY", "", "ECU", "PER", 
"BRA", "BOL", "PRY", "CHL", "ARG", "URY", "GBR", "IRL", "NLD", 
"BEL", "LUX", "FRA", "MCO", "LIE", "CHE", "ESP", "ADO", "PRT", 
"DEU", "", "POL", "AUT", "HUN", "", "CZE", "SVK", "ITA", "SMR", 
"MLT", "ALB", "", "YUG", "MKD", "HRV", "", "BIH", "LWI", "", 
"SVN", "GRC", "CYP", "BGR", "MDA", "ROM", "RUS", "EST", "LVA", 
"LTU", "UKR", "BLR", "ARM", "GEO", "AZE", "FIN", "SWE", "NOR", 
"DNK", "ISL", "CPV", "STP", "GNB", "GNQ", "GMB", "MLI", "SEN", 
"BEN", "MRT", "NER", "CIV", "GIN", "BFA", "LBR", "SLE", "GHA", 
"TGO", "CMR", "NGA", "GAB", "CAF", "TCD", "COG", "ZAR", "UGA", 
"KEN", "TZA", "BDI", "RWA", "SOM", "DJI", "ETH", "ERI", "AGO", 
"MOZ", "ZMB", "ZWE", "MWI", "ZAF", "M", "LSO", "BWA", "SWZ", 
"MDG", "COM", "MUS", "SYC", "MAR", "DZA", "TUN", "LBY", "SUR", 
"IRN", "TUR", "IRQ", "EGY", "SYR", "LBN", "JOR", "ISR", "SAU", 
"", "YEM", "", "KWT", "BHR", "QAT", "ARE", "OMN", "AFG", "TKM", 
"TJK", "KGZ", "UZB", "KAZ", "CHN", "MAC", "MNG", "TWN", "PRK", 
"KOR", "JPN", "IND", "BTN", "PAK", "BGD", "MMR", "LKA", "MDV", 
"NPL", "THA", "KHM", "LAO", "", "", "VNM", "MYS", "SGP", "BRN", 
"PHL", "IDN", "TMP", "AUS", "PNG", "NZL", "VUT", "SLB", "KIR", 
"TUV", "FJI", "TON", "NRU", "", "PCI", "FSM", "SAM", "HKG", "PRI", 
"WBG"), format.stata = "%9s"), cpi2007 = structure(c(7.2, 8.7, 
NaN, NaN, 4.2, 1.6, 3, 3.3, 3.4, 6.9, 5.6, 3.4, 6.8, 6.1, NaN, 
NaN, 3.5, 3, 2.8, 2.5, 4, 2.6, 5, NaN, 3.8, 2, 2.6, 3.5, 2.1, 
3.5, 3.5, 2.9, 2.4, 7, 2.9, 6.7, 8.4, 7.5, 9, 7.1, 8.4, 7.3, 
NaN, NaN, 9, 6.7, NaN, 6.5, 7.8, NaN, 4.2, 8.1, 5.3, NaN, 5.2, 
4.9, 5.2, NaN, 5.8, 2.9, NaN, NaN, 3.3, 4.1, NaN, 3.3, NaN, NaN, 
6.6, 4.6, 5.3, 4.1, 2.8, 3.7, 2.3, 6.5, 4.8, 4.8, 2.7, 2.1, 3, 
3.4, 2.1, 9.4, 9.3, 8.7, 9.4, 9.2, 4.9, 2.7, 2.2, 1.9, 2.3, 2.7, 
3.6, 2.7, 2.6, 2.6, 2.1, 1.9, 2.9, 2.1, 2.1, 3.7, 2.3, 2.4, 2.2, 
3.3, 2, 1.8, 2.1, 1.9, 2.8, 2.1, 3.2, 2.5, 2.8, 1.4, 2.9, 2.4, 
2.8, 2.2, 2.8, 2.6, 2.1, 2.7, 5.1, 4.5, 3.3, 5.4, 3.3, 3.2, 2.6, 
4.7, 4.5, 3.5, 3, 4.2, 2.5, NaN, 2.5, 4.1, 1.5, 2.9, 2.4, 3, 
4.7, 6.1, 3.4, NaN, 2.5, NaN, 4.3, 5, 6, 5.7, 4.7, 1.8, 2, 2.1, 
2.1, 1.7, 2.1, 3.5, NaN, 3, 5.7, NaN, 5.1, 7.5, 3.5, 5, 2.4, 
2, 1.4, 3.2, 3.3, 2.5, 3.3, 2, 1.9, NaN, NaN, 2.6, 5.1, 9.3, 
NaN, 2.5, 2.3, NaN, 8.6, 2, 9.4, 3.1, 2.8, 3.3, NaN, NaN, 1.7, 
NaN, NaN, NaN, NaN, 4.5, NaN, NaN, NaN), label = "TI corruption perception index 2007", format.stata = "%10.0g")), .Names = c("country", 
"wbcode", "cpi2007"), row.names = c(NA, -207L), class = c("tbl_df", 
"tbl", "data.frame"))

QT_Row:

structure(list(Country = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, "Afghanistan", NA, "Albania", NA, "Argentina", 
"Armenia", NA, "Australia", NA, "Azerbaijan", "Burundi", "Belgium", 
NA, NA, "Bangladesh", "Bulgaria", NA, NA, "Bosnia and Herzegovina", 
NA, NA, "Bolivia", "Brazil", NA, NA, NA, NA, NA, "Canada", "Switzerland", 
"Chile", NA, NA, "Cameroon", NA, "Colombia", NA, NA, NA, NA, 
"Cyprus", "Czech Republic", "Germany", NA, NA, "Denmark", NA, 
"Algeria", NA, "Egypt", NA, "Spain", "Estonia", "Ethiopia", "Finland", 
"Fiji", "France", NA, NA, "United Kingdom", "Georgia", "Ghana", 
NA, NA, NA, NA, "Greece", NA, NA, NA, NA, NA, "Croatia", NA, 
"Hungary", "Indonesia", "India", NA, NA, "Iraq", NA, "Israel", 
"Italy", "Jamaica", "Jordan", "Japan", NA, "Kazakhstan", "Kenya", 
"Kyrgyzstan", "Cambodia", NA, "Korea (South)", NA, NA, "Lebanon", 
"Liberia", "Libya", NA, NA, "Sri Lanka", NA, "Lithuania", "Luxembourg", 
"Latvia", "Kosovo", NA, NA, "Morocco", NA, "Moldova", "Madagascar", 
"Maldives", "Mexico", "FYR Macedonia", NA, NA, NA, "Mongolia", 
"Mozambique", NA, NA, "Malawi", "Malaysia", NA, "Nigeria", NA, 
NA, "Norway", "Nepal", NA, "New Zealand", NA, "Pakistan", NA, 
NA, "Peru", "Philippines", "Papua New Guinea", NA, NA, NA, "Portugal", 
"Paraguay", NA, "Romania", "Russia", "Rwanda", NA, NA, "Senegal", 
NA, "Solomon Islands", "Sierra Leone", "El Salvador", NA, NA, 
NA, "Sudan", "Slovakia", "Slovenia", NA, NA, NA, NA, NA, NA, 
"Thailand", NA, NA, NA, NA, NA, "Tunisia", "Turkey", NA, "Taiwan", 
"Tanzania", "Uganda", "Ukraine", "Uruguay", "United States", 
NA, NA, "Venezuela", "Vietnam", "Vanuatu", NA, "Yemen", "Serbia", 
"South Africa", "Democratic Republic of the Congo", "Zambia", 
"Zimbabwe"), wbcode = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, "ABW", "ADO", "AFG", "AGO", "ALB", "ARE", "ARG", "ARM", "ATG", 
"AUS", "AUT", "AZE", "BDI", "BEL", "BEN", "BFA", "BGD", "BGR", 
"BHR", "BHS", "BIH", "BLR", "BLZ", "BOL", "BRA", "BRB", "BRN", 
"BTN", "BWA", "CAF", "CAN", "CHE", "CHL", "CHN", "CIV", "CMR", 
"COG", "COL", "COM", "CPV", "CRI", "CUB", "CYP", "CZE", "DEU", 
"DJI", "DMA", "DNK", "DOM", "DZA", "ECU", "EGY", "ERI", "ESP", 
"EST", "ETH", "FIN", "FJI", "FRA", "FSM", "GAB", "GBR", "GEO", 
"GHA", "GIN", "GMB", "GNB", "GNQ", "GRC", "GRD", "GTM", "GUY", 
"HKG", "HND", "HRV", "HTI", "HUN", "IDN", "IND", "IRL", "IRN", 
"IRQ", "ISL", "ISR", "ITA", "JAM", "JOR", "JPN", "K", "KAZ", 
"KEN", "KGZ", "KHM", "KIR", "KOR", "KWT", "LAO", "LBN", "LBR", 
"LBY", "LCA", "LIE", "LKA", "LSO", "LTU", "LUX", "LVA", "LWI", 
"M", "MAC", "MAR", "MCO", "MDA", "MDG", "MDV", "MEX", "MKD", 
"MLI", "MLT", "MMR", "MNG", "MOZ", "MRT", "MUS", "MWI", "MYS", 
"NER", "NGA", "NIC", "NLD", "NOR", "NPL", "NRU", "NZL", "OMN", 
"PAK", "PAN", "PCI", "PER", "PHL", "PNG", "POL", "PRI", "PRK", 
"PRT", "PRY", "QAT", "ROM", "RUS", "RWA", "SAM", "SAU", "SEN", 
"SGP", "SLB", "SLE", "SLV", "SMR", "SOM", "STP", "SUR", "SVK", 
"SVN", "SWE", "SWZ", "SYC", "SYR", "TCD", "TGO", "THA", "TJK", 
"TKM", "TMP", "TON", "TTO", "TUN", "TUR", "TUV", "TWN", "TZA", 
"UGA", "UKR", "URY", "USA", "UZB", "VCT", "VEN", "VNM", "VUT", 
"WBG", "YEM", "YUG", "ZAF", "ZAR", "ZMB", "ZWE"), GCB2013 = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 46, NA, NA, NA, 
13, 18, NA, 1, NA, NA, NA, 4, NA, NA, 39, 8, NA, NA, 28, NA, 
NA, 36, NA, NA, NA, NA, NA, NA, 3, 7, 10, NA, NA, 62, NA, 22, 
NA, NA, NA, NA, 19, 15, 1, NA, NA, 1, NA, 41, NA, 36, NA, 2, 
6, 44, 1, NA, NA, NA, NA, 5, 4, 54, NA, NA, NA, NA, 22, NA, NA, 
NA, NA, NA, 4, NA, 12, 36, 54, NA, NA, 29, NA, 12, 5, 12, 37, 
1, NA, 34, 70, 45, 57, NA, 3, NA, NA, NA, 75, 62, NA, NA, 19, 
NA, 26, NA, 19, 16, NA, NA, 49, NA, 29, 28, 3, 33, 17, NA, NA, 
NA, 45, 62, NA, NA, NA, 3, NA, 44, NA, NA, 3, 31, NA, 3, NA, 
34, NA, NA, 20, 12, 27, NA, NA, NA, 3, 25, NA, 17, NA, 13, NA, 
NA, 57, NA, 34, 84, 12, NA, NA, NA, 17, 21, 6, NA, NA, NA, NA, 
NA, NA, 18, NA, NA, NA, NA, NA, 18, 21, NA, 36, 56, 61, 37, 3, 
7, NA, NA, 27, 30, 13, NA, 74, 26, 47, 46, NA, 62)), .Names = c("Country", 
"wbcode", "GCB2013"), row.names = c(NA, -207L), class = c("tbl_df", 
"tbl", "data.frame"))
Sietse
  • 25
  • 6
  • See: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example for how to share your data – Ian Wesley May 26 '17 at 17:29
  • Post samples of the two data frames you're merging by pasting into your question the output of, for example, `dput(TreismanData2[1:10,])` and `dput(QT_Row[1:10,])`. – eipi10 May 26 '17 at 17:35
  • I have tried to do this, but I have more than 1600 variables for 207 cases in TreismandData2. Is there a way to select only (e.g.) 4 of the variables? – Sietse May 26 '17 at 17:42
  • `TreismanData2[1:10, c("Var1", "Var2")]` would select the first 10 rows and the two columns named `Var1` and `Var2`. – eipi10 May 26 '17 at 17:48
  • I just added the data, hope this helps – Sietse May 26 '17 at 18:15
  • Thanks! As you say, I used TreismanData2 instead of 3 in my plot code - really stupid, but I am happy you helped me out! I ran the plot with the TreismanData3 and it worked, as you already pointed out. – Sietse May 26 '17 at 18:45

1 Answers1

3

Your code is an example of why it's generally best to avoid using attach.

This works:

TreismanData3 <- merge(TreismanData2, QT_Row, by="wbcode")

ggplot(TreismanData3, aes(cpi2007, GCB2013)) +
   geom_point(position = "jitter") + 
   geom_smooth(method = lm) + 
   labs(x="Perception", y="Experience")+theme_bw()

However, this fails with the error Aesthetics must be either length 1 or the same as the data (207): x, y:

TreismanData3 <- merge(TreismanData2, QT_Row, by="wbcode")

attach(TreismanData3)

ggplot(TreismanData2, aes(cpi2007, GCB2013)) +
   geom_point(position = "jitter") + 
   geom_smooth(method = lm) + 
   labs(x="Perception", y="Experience")+theme_bw()

There are two interacting issues with the code above. First, you used TreismanData2 in ggplot when I think you meant to use TreismanData3. If you hadn't used attach(TreismanData3), the ggplot code would have failed with the error Error in FUN(X[[i]], ...) : object 'GCB2013' not found, because you used the unmerged data frame, which doesn't contain the column GCB2013. The source of the error would have been obvious.

In actuality, you attached TreismanData3 before running ggplot with TreismanData2. attach makes the columns of a data frame available in the global environment as if they were independent objects. Thus, ggplot was able to use GCB2013 from TreismanData3 even though that column doesn't appear in the data frame that was used in the call to ggplot.

TreismanData3 has 196 rows and TreismanData2 has 207 rows. Because ggplot was called with TreismanData2 as the data frame, ggplot expects all the variables inside aes to have 207 values. Since GCB2013 has only 196 values (because it came from TreismanData3), it caused the error you received.

If you're working with multiple data frames, attach can cause unintended consequences and confusing errors such as what you experienced.

Moral: Don't get attached to attach.

eipi10
  • 91,525
  • 24
  • 209
  • 285