1

I am trying to learn R, and use the corrplot library to draw Y:City and X: Population graph. I wrote the below code:

enter image description here

When you look at the picture above, there are 2 columns City and population. When I run the code I get this error message:

Error in cor(Illere_Gore_Nufus) : 'x' must be numeric.

My excel data:

enter image description here

vestland
  • 55,229
  • 37
  • 187
  • 305
loki
  • 2,926
  • 8
  • 62
  • 115
  • 1
    could you give us an example of the `Illere_Gore_Nufus` dataframe you get? with `head(Illere_Gore_Nufus)`, it seems your read_excel doesnt return numeric values – mischva11 Aug 01 '18 at 12:41
  • 4
    I don't understand what you're trying to do. You want to calculate/visualise the correlation between which two variables? I can see only one numeric variable `Population`. – Maurits Evers Aug 01 '18 at 12:42
  • Your data frame has 2 columns. To access only one of the columns you need to use the $ to access the column by name like `cor(Illere_gore_Nufus$Population)`. See https://cran.r-project.org/ for some guides for using R. – Dave2e Aug 01 '18 at 12:51
  • @Dave2e; your solution is creating an error : Error in cor(Illere_Gore_Nufus$Population) : supply both 'x' and 'y' or a matrix-like 'x' – loki Aug 01 '18 at 14:38
  • I updated my question – loki Aug 01 '18 at 14:40
  • I removed the `rlang` tag since that tag is for a specific R package which isn't relevant for your question. In any event --- please give a [mcve], one which doesn't depend on screenshots. Reading [How to make a great r reproducible example](https://stackoverflow.com/q/5963269/4996248) might help. – John Coleman Aug 01 '18 at 15:28
  • This is all a bit confusing. I'm assuming you would like to make a correlation plot of the population between the cities in the table. In that you'd need more information than only one year for each city. And where do you want to make that plot? In Excel, like the title says? Or in Power BI like your tag indicates? – vestland Aug 08 '18 at 09:08

2 Answers2

2

In general, correlation plot (Scattered plot) can be plotted only when you have two continuous variable. Correlation is a value that tells you how two continuous variables are linearly related. The Correlation value will always fall between -1 and 1, where correlation value of -1 depicts weak linear relationship and correlation value of 1 depicts strong linear relationship between the two variables. Correlation value of 0 says that there is no linear relationship between the two variables, however, there could be curvi-linear relationship between the two variables

For example Area of the land Vs Price of the land Here is the Data

The correlation value for this data is 0.896, which means that there is a strong linear correlation between Area of the land and Price of the land (Obviously!).

Scatter plot in R would look like this Scatter plot

The R code would be

area<-c(650,785,880,990,1100,1250,1350,1800,2200,2800)
price<-c(250,275,280,290,350,340,400,335,420,460)
cor(area,price)
plot(area,price)

In Excel, for the same example, you can select the two columns, go to Insert > Scatter plot (under charts section) Scatter plot

In your case, the information can be plotted in bar graph with city in y axis and population in x axis or vice versa!

Hope I have answered you query!

Manoraj
  • 21
  • 3
  • If you by scattered plot mean scatter plot, it's not the same thing as a correlation plot that you will get if you use the corrplot() function in R. And that seems to be what OP is looking for here. – vestland Aug 08 '18 at 12:02
1

Some assumptions

You are asking how to do this in Excel, but your question is tagged R and Power BI (also RStudio, but that has been edited away), so I'm going to show you how to do this with R and Power BI. I'm also going to show you why you got that error message, and also why you would get an error message either way because your dataset is just not sufficient to make a correlation plot.


My answer

I'm assuming you would like to make a correlation plot of the population between the cities in your table. In that table you'd need more information than only one year for each city. I would check your data sources and see if you could come up with population numbers for, let's say, the last 10 years. In lack of the exact numbers for the cities in your table, I'm going to use some semi-made up numbers for the population in the 10 most populous countries (following your datastrutcture):

Country         2017        2016        2015        2014        2013
China           1415045928  1412626453  1414944844  1411445597  1409517397
India           1354051854  1340371473  1339431384  1343418009  1339180127
United States   326766748   324472802   325279622   324521777   324459463
Indonesia       266794980   266244787   266591965   265394107   263991379
Brazil          210867954   210335253   209297939   209860881   209288278
Pakistan        200813818   199761249   200253292   197655630   197015955
Nigeria         195875237   192568158   195757661   191728478   190886311
Bangladesh      166368149   165630262   165936711   166124290   164669751
Russia          143964709   143658415   143146914   143341653   142989754
Mexcio          137590740   137486490   136768870   137177870   136590740

Writing and debugging R code in Power BI is a real pain, so I would recommend installing R studio, write your little R snippets there, and then paste it into Power B.

The reason for your error message is that the function cor() onlyt takes numerical data as arguments. In your code sample the city names are given as arguments. And there are more potential traps in your code sample. You have to make sure that your dataset is numeric. And you have to make sure that your dataset has a shape that the cor() will accept.

Below is an R script that will do just that. Copy the data above, and store it in a file called data.xlsx on your C drive.

The Code

library(corrplot)
library(readxl)

# Read data
setwd("C:/")
data <- read_excel("data.xlsx")

# Set Country names as row index
rownames(data) <- data$Country

# Remove Country from dataframe
data$Country <- NULL

# Transpose data into a readable format for cor()
data <- data.frame(t(data))

# Plot data
corrplot(cor(data))

The plot

CorrPlot

Power BI

In Power BI, you need to import the data before you use it in an R visual:

Copy this:

Country,2017,2016,2015,2014,2013
China,1415045928,1412626453,1414944844,1411445597,1409517397
India,1354051854,1340371473,1339431384,1343418009,1339180127
United States,326766748,324472802,325279622,324521777,324459463
Indonesia,266794980,266244787,266591965,265394107,263991379
Brazil,210867954,210335253,209297939,209860881,209288278
Pakistan,200813818,199761249,200253292,197655630,197015955
Nigeria,195875237,192568158,195757661,191728478,190886311
Bangladesh,166368149,165630262,165936711,166124290,164669751
Russia,143964709,143658415,143146914,143341653,142989754
Mexcio,137590740,137486490,136768870,137177870,136590740

Save it as countries.csv in a folder of your choosing, and pick it up in Power BI using Get Data | Text/CSV, click Edit in the dialog box, and in the Power Query Editor, click Use First Row as headers so that you have this table in your Power Query Editor:

enter image description here

Click Close & Apply and make sure that you've got the data available under VISUALIZATIONS | FIELDS:

enter image description here

Click R under VISUALIZATIONS:

enter image description here

Select all columns under FIELDS | countries so that you get this setup:

enter image description here

Take parts of your R snippet that we prepared above

library(corrplot)

# Set Country names as row index
data <- dataset

rownames(data) <- data$Country

# Remove Country from dataframe
data$Country <- NULL

# Transpose data into a readable format for cor()
data <- data.frame(t(data))

# Plot data
corrplot(cor(data))

And paste it into the Power BI R script Editor:

enter image description here

Click Run R Script:

enter image description here

And you're gonna get this:

enter image description here

That's it!

If you change the procedure to importing data from an Excel file instead of a textfile (using Get Data | Excel , you've successfully combined the powers of Excel, Power BI and R to produce a scatterplot!

I hope this is what you were looking for!

vestland
  • 55,229
  • 37
  • 187
  • 305