0

I want to create from the dataset a list that contains word and frequency of the word . I did it and saved into val named 'mylist'. now I want to sort the list according to the frequency of the word and to create barplot from the 10 words that have the higher frequency.

but I not succeeded to sort it. I tried many ways to change the type of 'mylist' to data.frame or date.table but still the column of the frequency stay a list. To sumup I have the DT var that contains it is a list with 2 columns x-contains the words and type is character . The 2 column is 'v' - that contains the frequency and it is a list. I am not succeeding to sort it by the frequency. please help me.

library(ggplot2)
libary(MASS)
#get the data
data.uri = "http://www.crowdflower.com/wp-content/uploads/2016/03/gender-classifier-DFE-791531.csv"
pwd = getwd()
data.file.name = "gender.csv"
data.file = paste0(pwd, "./", data.file.name)
download.file(data.uri, data.file)
data = read.csv(data.file.name)

#manipulate the data 
data <- data[data$X_unit_id < 815719694,] 
print(data$X_unit_id)

#get all female has white sidebar
female_colors <- subset(data, data$gender=="female")
female_colors$fav_number
#get all male fav_numbers
male_colors <- subset(data, data$gender=="male")
male_colors$fav_number


text_male = subset(data, data$gender=="male")
text_male = text_male$text
print(text_male[1])
print(length(text_male))
v <- text_male[1:length(text_male)]
print(v)
print (v[1])
count_of_list = 0;
x = list()
for ( i in v) {
  # Merge the two lists.
  x <- c(x,unlist(strsplit(i," ")))
}
count = 0;
mylist = list()
for (word in x){
  for (xWord in x){
    if (word == xWord)
      count =  count + 1;
  }
  key <- word
  value <- count
  mylist[[ key ]] <- value
  count = 0;
}
libary(data.table)
require(data.table)
DT = data.table(x=c(names(mylist)),v=c(mylist))
DT
  • Could you provide a minimal example, i.e. just the dataset (e.g. with dput()) and the code that you tried to sort it? This is way too much to work through. Edit: In case you need some help, [here is a good guide on making a minimal, reproducible example](http://stackoverflow.com/a/5963610/5805670). – slamballais Apr 23 '16 at 11:23

2 Answers2

0

As suggested in comments, a reproducible example would be useful in creating an answer to help you. I will suggest a proposal anyway. Try to adapt this peocedure to your data.

Convert your list to a dataframe and use order:

df <- as.data.frame(your.data)

 df <- data.frame(id = c("B", "A", "D", "C"), y = c(6, 8, 1, 5))
 df

  id y
1  B 6
2  A 8
3  D 1
4  C 5

 df2 <- df[order(df$id), ]
 df2

  id y
2  A 8
1  B 6
4  C 5
3  D 1
Worice
  • 3,847
  • 3
  • 28
  • 49
0

It looks like you're using a cumbersome way to calculate the word counts, something like this is faster and simpler -

library(dplyr)
foo <- c("ant", "ant", "bat", "dog","egg","ant","bat")
bar <- rnorm(7, 5, 2)
df <- data.frame(foo, bar)
group_by(df, foo)  %>% summarise(n = n()) %>% arrange(desc(n))


 foo     n
  (fctr) (int)
1    ant     3
2    bat     2
3    dog     1
4    egg     1
abind-off
  • 61
  • 5