0

I have a data set, what looks approximately like the one below.

      column1  column2  column3  column4
1     2        3        string   4
2     3        56       string   6
3     86       23       string   4

All of the columns have numeric values except column3.
I am trying to figure out the way how to count how many times the value in column3 was mentioned with a particular value from column4. In other words, in this example, there are 2 values of string with a value 4.

For now, I've come up with a following code:

data$new_column = ifelse(data$column3=="string" && data$column4==4, "", "")
table(data$new_column)

But it gives me a total number of all the values, which is in example case 3

I have tried this solution: Counting the number of elements with the values of x in a vector, but it didn't help much as it gives me an error:

[ reached getOption("max.print") -- omitted x rows ]

What, as far as I understand, that this solution doesn't work on large datasets.
What would be the best way to realize this solution?

Thanks for the ideas in advance!

Community
  • 1
  • 1
Alex Ljamin
  • 737
  • 8
  • 31
  • you could try this `library(data.table); setDT(data)[, no.string := .N, by = c('column3','column4')]` Where, data is your dataset – Veerendra Gadekar Feb 24 '16 at 15:05
  • @VeerendraGadekar thanks for your idea, but it is also not working well on large datasets, giving me the similar to the post error `[ reached getOption("max.print") -- omitted x rows]`. I need to have an access to all the rows without row omission. – Alex Ljamin Feb 25 '16 at 15:54
  • Did you try the option in first comment?? – Veerendra Gadekar Feb 25 '16 at 15:57
  • @VeerendraGadekar Yes, indeed. I've bumped into `Error: could not find function "setDT"` or `Error in library(data.table) : there is no package called ‘data.table’` – Alex Ljamin Feb 25 '16 at 16:00
  • Yes, so you need to install data.table package for this. If you are looking for efficiency, data.table would be the best approach. You can install it using install.packages("data.table") and then try again – Veerendra Gadekar Feb 25 '16 at 16:03

2 Answers2

0

Try this instead:

 data$new_column = ifelse(data$column3=="string" & data$column4==4, T, F)

 > table(data$new_column)
    FALSE  TRUE 
        1     2 

Here you can count how often the statement was true. Note that in your previous version, you have had the empty string in both cases, when it was true and false. You would thus expect the output to be the same as the length of the dataframe.

Having said that, a quick and dirty hack might be:

 data$new_id <- paste0(data$column3, data$column4)
 table(data$new_id)

This gives you the count of all combinations occuring, in case you are not only interested in these two particular values.

A more advanced solution might be the following:

 library(dplyr)
 data %>% group_by(column4) %>% summarise(out = sum(column3=="string") )

which gives you, for all distinct values in column 4, how often column 3 was "string".

coffeinjunky
  • 11,254
  • 39
  • 57
  • your solution works great. Thanks! Could you explain what exactly `paste0` do? – Alex Ljamin Feb 24 '16 at 21:38
  • `paste` combines two strings. For instance, `paste0("one ", "coffee")` returns as outcome a string called `"one coffee"`. The zero in the function name is just a shortcut for the more elaborate version `paste("one ", "coffee", sep="")`. See `?paste`. – coffeinjunky Feb 25 '16 at 12:00
0
df1<- data.frame(column3=c("string","string","string"),column4=c(4,6,4))
df2 <- as.data.frame(table(df1))
df2

which gives a result:

  column3 column4 Freq
1  string       4    2
2  string       6    1

Don't bother

[ reached getOption("max.print") -- omitted x rows ]

because it means that your result is too long to be printed on screen, but it does not mean that it was no computed.

bartoszukm
  • 693
  • 3
  • 10