-2

Im working on stackoverflow data dump .csv file and I need to to find the distribution of scores for questions.

I opened the file in R and extracted the two columns that I need which are the PostTypeID and Score.

example :

Example

I need to find :

3 rows in the score column that has the score 11.

2 rows in the score column that has the score 3. .... etc

The thing is the data is too large, it has 3 million rows and I don't know how to get the distribution.

Note I'm a beginner in R, so I need the simplest way to do that.

user8863554
  • 167
  • 1
  • 12
  • 1
    You mention *"filter"* and *"get the distribution"*, the two are not the same. Please read about how to ask good questions (refs https://stackoverflow.com/help/mcve and https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), and then edit your question. Some pointers: consumable data (e.g., `dput`) and desired output. – r2evans Mar 02 '18 at 04:15

2 Answers2

2

You are looking for the table function.

If d is your data structure, then you want

table(d$Score)

Daniel V
  • 1,305
  • 7
  • 23
1

x=data[, score==3] to get rows with score 3

Ronak Bokaria
  • 616
  • 6
  • 6