0

user_a - 3 user_b - 4 user_c - 1 user_d - 4 I want to show the distribution over number of tweets per author in r using a histogram. The original file has 1048575 such rows I did hist(df$twitter_count, nrow(df)) but I don't think its correct

Mehru
  • 1
  • 1
  • 3
  • please include your data as editable text instead of link to an image – Imran Ali Oct 22 '17 at 04:37
  • Hi Mehru - welcome to SO... it would help me help you if I knew a little more about your data - see https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. Your nrow(df) is speficying the breaks in the histogram... If you are looking at doing some conditional histograms (e.g. number of tweets per day/week/month/year per author) you might consider using lattice or ggplot2. – James Thomas Durant Oct 22 '17 at 04:40
  • If you want the histogram of twitter counts, just use `hist(df$twitter_count)` – kangaroo_cliff Oct 22 '17 at 04:44
  • 1
    see [here](https://stackoverflow.com/questions/46860454/constructing-histogram-from-2-variables-in-1-column-in-r/46860693#46860693) – vaettchen Oct 22 '17 at 05:19
  • 1
    Possible duplicate of [Constructing histogram from 2 variables in 1 column in R](https://stackoverflow.com/questions/46860454/constructing-histogram-from-2-variables-in-1-column-in-r) – vaettchen Oct 22 '17 at 05:20

3 Answers3

3

It seems I have misunderstood the question. I think following could be what the OP is looking for.

df <- data.frame(user = letters, 
                 twitter_count = sample.int(200, 26))

ggplot(df, aes(user, twitter_count)) +
  geom_col()

enter image description here


Assuming you are looking for multiple histograms.

Replace user with respective variable name in your data.frame.

# Example data
df <- data.frame(user = iris$Species, 
                 twitter_count= round(iris[, 1]*10))

# Histograms using ggplot2 package
library(ggplot2)
ggplot(df, aes(x = twitter_count)) +
  geom_histogram() + facet_grid(.~user)

Best to use an alternative method to see the distributions of twitter counts if your data contain many twitter users.

kangaroo_cliff
  • 6,067
  • 3
  • 29
  • 42
1

If each row of the data.frame represents a user -

set.seed(1)
df <- data.frame(user = letters, twitter_count = rpois(26, lambda = 4) + 1)
hist(df$twitter_count)

enter image description here

0

Since you said, distribution for 'each user', I think it should be a bar blot:

require(data.table)
dat <- fread("
  user_a - 3
  user_b - 4
  user_c - 1
  user_d - 4"
)

barplot( names.arg = dat$V1, as.numeric(dat$V3) )

barplot

or if you are looking for histograms, then:

hist(as.numeric(dat$V3), xlab = "", main="Histogram")

histogram

LeMarque
  • 733
  • 5
  • 21