How do I create a histogram in r for a 2 column data?

Question

user_a - 3 user_b - 4 user_c - 1 user_d - 4 I want to show the distribution over number of tweets per author in r using a histogram. The original file has 1048575 such rows I did hist(df$twitter_count, nrow(df)) but I don't think its correct

please include your data as editable text instead of link to an image — Imran Ali, Oct 22 '17 at 04:37
Hi Mehru - welcome to SO... it would help me help you if I knew a little more about your data - see https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. Your nrow(df) is speficying the breaks in the histogram... If you are looking at doing some conditional histograms (e.g. number of tweets per day/week/month/year per author) you might consider using lattice or ggplot2. — James Thomas Durant, Oct 22 '17 at 04:40
If you want the histogram of twitter counts, just use `hist(df$twitter_count)` — kangaroo_cliff, Oct 22 '17 at 04:44
see [here](https://stackoverflow.com/questions/46860454/constructing-histogram-from-2-variables-in-1-column-in-r/46860693#46860693) — vaettchen, Oct 22 '17 at 05:19
Possible duplicate of [Constructing histogram from 2 variables in 1 column in R](https://stackoverflow.com/questions/46860454/constructing-histogram-from-2-variables-in-1-column-in-r) — vaettchen, Oct 22 '17 at 05:20

kangaroo_cliff · Answer 1 · 2017-10-22T06:05:55.923

It seems I have misunderstood the question. I think following could be what the OP is looking for.

df <- data.frame(user = letters, 
                 twitter_count = sample.int(200, 26))

ggplot(df, aes(user, twitter_count)) +
  geom_col()

Assuming you are looking for multiple histograms.

Replace user with respective variable name in your data.frame.

# Example data
df <- data.frame(user = iris$Species, 
                 twitter_count= round(iris[, 1]*10))

# Histograms using ggplot2 package
library(ggplot2)
ggplot(df, aes(x = twitter_count)) +
  geom_histogram() + facet_grid(.~user)

Best to use an alternative method to see the distributions of twitter counts if your data contain many twitter users.

score 1 · Accepted Answer · answered Oct 22 '17 at 04:47

1

If each row of the data.frame represents a user -

set.seed(1)
df <- data.frame(user = letters, twitter_count = rpois(26, lambda = 4) + 1)
hist(df$twitter_count)

answered Oct 22 '17 at 04:47

James Thomas Durant

285
4
13

score 0 · Answer 3 · answered Oct 22 '17 at 11:43

Since you said, distribution for 'each user', I think it should be a bar blot:

require(data.table)
dat <- fread("
  user_a - 3
  user_b - 4
  user_c - 1
  user_d - 4"
)

barplot( names.arg = dat$V1, as.numeric(dat$V3) )

barplot

or if you are looking for histograms, then:

hist(as.numeric(dat$V3), xlab = "", main="Histogram")

histogram

How do I create a histogram in r for a 2 column data?

3 Answers3