0

I have a df with IDs and values and I would like to generate a density plot for every unique ID and check about the distributions if its normal or skewed.There are also NA values and i am not sure how to treat them. Should i just remove them and create the density plot? Also the range of the values between the IDs is different.

| ID       |  Values |
| -------- | ------- |
| F1       | 45      |
| F1       | 56      |
| F1       | NA      |
| F1       | 68      |
| F1       | 55      |
| F2       | 23      |
| F2       | 44      |
| F2       | 34      |
| F2       | NA      |
| F2       | NA      |
| F2       | 34      |
| F3       | 5055    | 
| F3       | 4567    |
| F3       | NA      | 
| F3       | 4789    |
| F3       | 5567    |
| F3       | 6002    |
| F4       | 9045    |
| F4       | 9500    | 
| F4       | 9760    |
| F4       | NA      |
| F4       | 9150    |

Please help as I am beginner in the visualizations

pipts
  • 87
  • 7
  • have you checked this https://stackoverflow.com/questions/26075181/multiple-groups-in-geom-density-plot – StupidWolf Aug 04 '21 at 05:53
  • I saw that but I have 30 different IDs, and I used that but what is doing is to generate 30 density plots on one page, so when I run that code I cannot see any of the plots, I just see a few lines. Maybe if there is a way to split all those density plots into sets of 5 ? Also there is the problem of the X-axis values. In all of the density plots the range pf values in the X-axis is the same but as you can see in one ID the values are like 45,56,68 while in another the values are 4789, 5567,6002 @StupidWolf – pipts Aug 04 '21 at 08:26
  • How about splitting your data frame by 5 IDs and plotting? – StupidWolf Aug 04 '21 at 08:32
  • But that will not solve the problem regarding the very big distance in the values between the IDs. For some reason, the range in values in the X-axis doesn't change and remain the same for all the plots. Imagine that the x axis has a range from 0 to 10 or from 0 to 5000 for all the plots @StupidWolf – pipts Aug 04 '21 at 08:54
  • hey, hard to know what's going on. https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. try to provide a concrete example, help us to help yourself – StupidWolf Aug 04 '21 at 08:58
  • `dput(df) structure(list(ID = c("F1", "F1", "F1", "F1", "F1", "F1", "F1", "F2", "F2", "F2", "F2", "F2", "F2", "F2", "F2", "F3", "F3", "F3", "F3", "F3", "F3", "F3", "F3", "F3", "F4", "F4", "F4", "F4", "F4", "F4", "F4", "F4"), Values = c(9.6, NA, 10.2, 9.8, 9.9, 9.9, 9.9, 1.2, 1.2, 1.8, 1.5, 1.5, 1.6, 1.4, NA, 3266, 3256, 7044, 6868, NA, 3405, 3410, NA, 5567, 59.4, 56, 52.8, 52.4, 55.5, NA, NA, 53.6)), class = "data.frame", row.names = c(NA, -32L)) – pipts Aug 04 '21 at 10:41
  • would that help? @StupidWolf – pipts Aug 04 '21 at 10:42

1 Answers1

1

You don't need to remove the NAs, they are ignored in the plot. You have at most 5 values per ID in your dataset so a density plot is not so useful. So for your example above, we can take the log10 and try a density:

ggplot(df,aes(x = Values,y=ID)) + geom_jitter(width=0.1) + scale_x_log10()

enter image description here

A stripchart might be more useful:

ggplot(df,aes(x = Values,y=ID)) + geom_jitter(width=0.1) + scale_x_log10()

enter image description here

StupidWolf
  • 45,075
  • 17
  • 40
  • 72