1

I am testing to make an analysis of sentiment on a dataset. Here, I am trying to see if if there are any interesting observations between message volume and buzzs, message volume and scores...

There is what my dataset looks like:

> str(data)
'data.frame':   40 obs. of  11 variables:
 $ Date Time   : POSIXct, format: "2015-07-08 09:10:00" "2015-07-08 09:10:00" ...
 $ Subject     : chr  "MMM" "ACE" "AES" "AFL" ...
 $ Sscore      : chr  "-0.2280" "-0.4415" "1.9821" "-2.9335" ...
 $ Smean       : chr  "0.2593" "0.3521" "0.0233" "0.0035" ...
 $ Svscore     : chr  "-0.2795" "-0.0374" "1.1743" "-0.2975" ...
 $ Sdispersion : chr  "0.375" "0.500" "1.000" "1.000" ...
 $ Svolume     : num  8 4 1 1 5 3 2 1 1 2 ...
 $ Sbuzz       : chr  "0.6026" "0.7200" "1.9445" "0.8321" ...
 $ Last close  : chr  "155.430000000" "104.460000000" "13.200000000" "61.960000000" ...
 $ Company name: chr  "3M Company" "ACE Limited" "The AES Corporation" "AFLAC Inc." ...
 $ Date        : Date, format: "2015-07-08" "2015-07-08" ...

I thought about a linear regression, So I wanted to use ggplot, but I use this code and I think I got wrong somewhere as I don't have the regression lines that appears... Is it because the regression is to weak? I helped with the code from : code of topchef

Mine is:

library(ggplot2)
require(ggplot2)
library("reshape2")
require(reshape2)
data.2 = melt(data[3:9], id.vars='Svolume')
ggplot(data.2) +
  geom_jitter(aes(value,Svolume, colour=variable),) + geom_smooth(aes(value,Svolume, colour=variable), method=lm, se=FALSE) +
  facet_wrap(~variable, scales="free_x") +
  labs(x = "Variables", y = "Svolumes")

But I probably missunderstood something as I don't get what I want. I am very new to R so I would love someone help me.

My results

I have this error:

    geom_smooth: Only one unique x value each group.Maybe you want aes(group = 1)?
geom_smooth: Only one unique x value each group.Maybe you want aes(group = 1)?
geom_smooth: Only one unique x value each group.Maybe you want aes(group = 1)?
geom_smooth: Only one unique x value each group.Maybe you want aes(group = 1)?
geom_smooth: Only one unique x value each group.Maybe you want aes(group = 1)?
geom_smooth: Only one unique x value each group.Maybe you want aes(group = 1)?

Finally do you think it would be possible to have a different colors for the different Subjects instead of one color per variable please? Can I add the regression line on every graphs?

Thank you for your help.

Sample data:

       Date Time Subject  Sscore  Smean Svscore Sdispersion Svolume  Sbuzz    Last close        Company name       Date
1  2015-07-08 09:10:00     MMM -0.2280 0.2593 -0.2795       0.375       8 0.6026 155.430000000          3M Company 2015-07-08
2  2015-07-08 09:10:00     ACE -0.4415 0.3521 -0.0374       0.500       4 0.7200 104.460000000         ACE Limited 2015-07-08
3  2015-07-07 09:10:00     AES  1.9821 0.0233  1.1743       1.000       1 1.9445  13.200000000 The AES Corporation 2015-07-07
4  2015-07-04 09:10:00     AFL -2.9335 0.0035 -0.2975       1.000       1 0.8321  61.960000000          AFLAC Inc. 2015-07-04
5  2015-07-07 09:10:00     MMM  0.2977 0.2713 -0.7436       0.400       5 0.4895 155.080000000          3M Company 2015-07-07
6  2015-07-07 09:10:00     ACE -0.2331 0.3519 -0.1118       1.000       3 0.7196 103.330000000         ACE Limited 2015-07-07
7  2015-06-28 09:10:00     AES  1.8721 0.0609  1.9100       0.500       2 2.4319  13.460000000 The AES Corporation 2015-06-28
8  2015-07-03 09:10:00     AFL  0.6024 0.0330 -0.2663       1.000       1 0.6822  61.960000000          AFLAC Inc. 2015-07-03
9  2015-07-06 09:10:00     MMM -1.0057 0.2579 -1.3796       1.000       1 0.4531 155.380000000          3M Company 2015-07-06
10 2015-07-06 09:10:00     ACE -0.0263 0.3435 -0.1904       1.000       2 1.3536 103.740000000         ACE Limited 2015-07-06
11 2015-06-19 09:10:00     AES -1.1981 0.1517  1.2063       1.000       2 1.9427  13.850000000 The AES Corporation 2015-06-19
12 2015-07-02 09:10:00     AFL -0.8247 0.0269  1.8635       1.000       5 2.2454  62.430000000          AFLAC Inc. 2015-07-02
13 2015-07-05 09:10:00     MMM -0.4272 0.3107 -0.7970       0.167       6 0.6003 155.380000000          3M Company 2015-07-05
14 2015-07-04 09:10:00     ACE  0.0642 0.3274 -0.0975       0.667       3 1.2932 103.740000000         ACE Limited 2015-07-04
15 2015-06-17 09:10:00     AES  0.1627 0.1839  1.3141       0.500       2 1.9578  13.580000000 The AES Corporation 2015-06-17
16 2015-07-01 09:10:00     AFL -0.7419 0.0316  1.5699       0.250       4 2.0988  62.200000000          AFLAC Inc. 2015-07-01
17 2015-07-04 09:10:00     MMM -0.5962 0.3484 -1.2481       0.667       3 0.4496 155.380000000          3M Company 2015-07-04
18 2015-07-03 09:10:00     ACE  0.8527 0.3085  0.1944       0.833       6 1.3656 103.740000000         ACE Limited 2015-07-03
19 2015-06-15 09:10:00     AES  0.8145 0.1725  0.2939       1.000       1 1.6121  13.350000000 The AES Corporation 2015-06-15
20 2015-06-30 09:10:00     AFL  0.3076 0.0538 -0.0938       1.000       1 0.7071  61.440000000          AFLAC Inc. 2015-06-30

dput

data <- structure(list(`Date Time` = structure(c(1436361000, 1436361000, 
1436274600, 1436015400, 1436274600, 1436274600, 1435497000, 1435929000, 
1436188200, 1436188200, 1434719400, 1435842600, 1436101800, 1436015400, 
1434546600, 1435756200, 1436015400, 1435929000, 1434373800, 1435669800
), class = c("POSIXct", "POSIXt"), tzone = ""), Subject = c("MMM", 
"ACE", "AES", "AFL", "MMM", "ACE", "AES", "AFL", "MMM", "ACE", 
"AES", "AFL", "MMM", "ACE", "AES", "AFL", "MMM", "ACE", "AES", 
"AFL"), Sscore = c(-0.228, -0.4415, 1.9821, -2.9335, 0.2977, 
-0.2331, 1.8721, 0.6024, -1.0057, -0.0263, -1.1981, -0.8247, 
-0.4272, 0.0642, 0.1627, -0.7419, -0.5962, 0.8527, 0.8145, 0.3076
), Smean = c(0.2593, 0.3521, 0.0233, 0.0035, 0.2713, 0.3519, 
0.0609, 0.033, 0.2579, 0.3435, 0.1517, 0.0269, 0.3107, 0.3274, 
0.1839, 0.0316, 0.3484, 0.3085, 0.1725, 0.0538), Svscore = c(-0.2795, 
-0.0374, 1.1743, -0.2975, -0.7436, -0.1118, 1.91, -0.2663, -1.3796, 
-0.1904, 1.2063, 1.8635, -0.797, -0.0975, 1.3141, 1.5699, -1.2481, 
0.1944, 0.2939, -0.0938), Sdispersion = c(0.375, 0.5, 1, 1, 0.4, 
1, 0.5, 1, 1, 1, 1, 1, 0.167, 0.667, 0.5, 0.25, 0.667, 0.833, 
1, 1), Svolume = c(8L, 4L, 1L, 1L, 5L, 3L, 2L, 1L, 1L, 2L, 2L, 
5L, 6L, 3L, 2L, 4L, 3L, 6L, 1L, 1L), Sbuzz = c(0.6026, 0.72, 
1.9445, 0.8321, 0.4895, 0.7196, 2.4319, 0.6822, 0.4531, 1.3536, 
1.9427, 2.2454, 0.6003, 1.2932, 1.9578, 2.0988, 0.4496, 1.3656, 
1.6121, 0.7071), `Last close` = c(155.43, 104.46, 13.2, 61.96, 
155.08, 103.33, 13.46, 61.96, 155.38, 103.74, 13.85, 62.43, 155.38, 
103.74, 13.58, 62.2, 155.38, 103.74, 13.35, 61.44), `Company name` = c("3M Company", 
"ACE Limited", "The AES Corporation", "AFLAC Inc.", "3M Company", 
"ACE Limited", "The AES Corporation", "AFLAC Inc.", "3M Company", 
"ACE Limited", "The AES Corporation", "AFLAC Inc.", "3M Company", 
"ACE Limited", "The AES Corporation", "AFLAC Inc.", "3M Company", 
"ACE Limited", "The AES Corporation", "AFLAC Inc."), Date = structure(c(16624, 
16624, 16623, 16620, 16623, 16623, 16614, 16619, 16622, 16622, 
16605, 16618, 16621, 16620, 16603, 16617, 16620, 16619, 16601, 
16616), class = "Date")), .Names = c("Date Time", "Subject", 
"Sscore", "Smean", "Svscore", "Sdispersion", "Svolume", "Sbuzz", 
"Last close", "Company name", "Date"), row.names = c("1", "2", 
"3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", 
"15", "16", "17", "18", "19", "20"), class = "data.frame")
Community
  • 1
  • 1
Cec Jurczyk
  • 87
  • 1
  • 10
  • Can you add some sample data? – CMichael Jul 14 '15 at 15:11
  • Done. I figured out how to take only the variables I was interested in (edited). I just need now to know why I don't have the regression lines on my graph & how to add the regression expressions on the graph, or under? with lm( otherVariables ~ data$Svolume). Thank you – Cec Jurczyk Jul 14 '15 at 16:02
  • I don't get that error, your code works fine. for your second question, you need to add subject to the melting `data.3 = melt(data[, 2:9], id.vars = c('Subject','Svolume'))` and then change the `colour=variable` to subject – rawr Jul 14 '15 at 16:31

1 Answers1

4

Note the warning Maybe you want aes(group = 1). All I've done is add group = 1 to aes for geom_smooth.

ggplot(data.2) +
  geom_jitter(aes(value,Svolume, colour=variable),) + 
  geom_smooth(aes(value,Svolume, colour=variable, group = 1), method=lm, se=FALSE) +
  facet_wrap(~variable, scales="free_x") +
  labs(x = "Variables", y = "Svolumes")

Some unsolicited advice

Here's how I would write the ggplot code:

library(ggplot2)
require(reshape2)

data.2 = melt(data[3:9], id.vars='Svolume')

ggplot(data.2) +
  aes(x = value, y = Svolume, colour = variable) +
  geom_jitter() +
  geom_smooth(method=lm, se=FALSE, aes(group = 1)) +
  facet_wrap(~variable, scales="free_x") +
  labs(x = "Variables", y = "Svolumes")
Community
  • 1
  • 1
Mhairi McNeill
  • 1,951
  • 11
  • 20
  • Thank you very much for the solution and the tips. Sorry I am very new to this forum. I will read what you sent me. Besides, do you how how to add the regression expression for each of the graphs please? – Cec Jurczyk Jul 15 '15 at 10:04
  • I think you'd need to find the regression coefficients manually, using `lm` and then use a `geom_text` layer. That would be the only way I'm aware of. Also, welcome to SO! – Mhairi McNeill Jul 15 '15 at 10:34