2

I have dataset with total of 500 rows: 250 pop, 250 heavy-metal. How can i produce this plot in R?

example

My actual dataset is not seperated like the example i give though. Should i just seperate it?

What i have done so far

ggplot(data = playlist, aes(x = genre, y = danceability)) +
  geom_jitter(aes(col = genre))

enter image description here

Pandu A
  • 91
  • 3
  • 8
  • You can create an ID column to use on the x axis instead of genre and then use `geom_point`. Afterwards, you can suppress the ID label in the plot, if you'd like. – infinitefactors Jun 01 '20 at 14:50

2 Answers2

1

Please see this post about creating a simple self-contained example.

Given the screenshots you've given us, I don't see how you can produce the plot you're after because you have only one numeric variable - danceability. The scatter plot needs two - Data on the x-axis in your example.

That said, once you've got your second numeric variable, you've almost answered your own question. Something like

ggplot(data = playlist, aes(x = Data, y = danceability)) +
  geom_jitter(aes(col = genre))

will give you what you want.

Limey
  • 10,234
  • 2
  • 12
  • 32
1

The problem you're having is that you've mapped x = genre. Thus, genre gets converted to a factor and then plotted at either 1 or 2.

Instead, what you want to do is have the data plotted at a random point on the x-axis. A simple way to do this is to randomly sample 1:nrow(playlist) like this:

ggplot(data = playlist, aes(x = sample(1:nrow(playlist),nrow(playlist)),
                            color = genre, y = danceability)) +
  geom_point(aes(col = genre)) + labs(x = "data")

enter image description here Note that you no longer need geom_jitter.

Data:

playlist <- structure(list(danceability = c(0.683, 0.768, 0.693, 0.765, 0.506, 
0.809, 0.628, 0.556, 0.72, 0.706, 0.414, 0.448, 0.687, 0.747, 
0.532, 0.483, 0.491, 0.224, 0.666, 0.416, 0.44, 0.362, 0.28, 
0.42, 0.115, 0.35, 0.519, 0.538, 0.507, 0.261), genre = c("pop", 
"pop", "pop", "pop", "pop", "pop", "pop", "pop", "pop", "pop", 
"pop", "pop", "pop", "pop", "pop", "heavy-metal", "heavy-metal", 
"heavy-metal", "heavy-metal", "heavy-metal", "heavy-metal", "heavy-metal", 
"heavy-metal", "heavy-metal", "heavy-metal", "heavy-metal", "heavy-metal", 
"heavy-metal", "heavy-metal", "heavy-metal")), row.names = c(NA, 
-30L), class = "data.frame")

Note: This data was derived by optical character recognition from your screen shot, please excuse any errors.

Ian Campbell
  • 23,484
  • 14
  • 36
  • 57