1

I try to collect for a list of users the questions.

So I prepared this command lines:

library(stackr)
dft <- data.frame()
for (j in 1:nrow(df)) {
     questions <- stack_users(df$userid[j], "questions", num_pages=1000000, pagesize=100, filter="withbody")                
    for (s in 1:nrow(questions)){
      dft <- rbind(dft, data.frame(
        tags               = ifelse(is.null(questions$tags[s])               , NA, questions$tags[s]),
        is_answered        = ifelse(is.null(questions$is_answered[s])        , NA, questions$is_answered[s]),
        view_count         = ifelse(is.null(questions$view_count[s])         , NA, questions$view_count[s]),
        accepted_answer_id = ifelse(is.null(questions$accepted_answer_id[s]) , NA, questions$accepted_answer_id[s]),
        answer_count       = ifelse(is.null(questions$answer_count[s])       , NA, questions$answer_count[s]),
        score              = ifelse(is.null(questions$score[s])              , NA, questions$score[s]),
        last_activity_date = ifelse(is.null(questions$last_activity_date[s]) , NA, questions$last_activity_date[s]),
        creation_date      = ifelse(is.null(questions$creation_date[s])      , NA, questions$creation_date[s]),
        last_edit_date     = ifelse(is.null(questions$last_edit_date[s])     , NA, questions$last_edit_date[s]),
        question_id        = ifelse(is.null(questions$question_id[s])        , NA, questions$question_id[s]),
        link               = ifelse(is.null(questions$link[s])               , NA, questions$link[s]),
        title              = ifelse(is.null(questions$title[s])              , NA, questions$title[s]),
        body               = ifelse(is.null(questions$body[s])               , NA, questions$body[s]),
        owner_reputation   = ifelse(is.null(questions$owner_reputation[s])   , NA, questions$owner_reputation[s]),
        owner_user_id      = ifelse(is.null(questions$owner_user_id[s])      , NA, questions$owner_user_id[s]),
        owner_user_type    = ifelse(is.null(questions$owner_user_type[s])    , NA, questions$owner_user_type[s]),
        owner_accept_rate  = ifelse(is.null(questions$owner_accept_rate[s])  , NA, questions$owner_accept_rate[s]),
        owner_link         = ifelse(is.null(questions$owner_link[s])         , NA, questions$owner_link[s])
      ))

    }       

}

However it takes to much time to collect a list of different user ids. Is there any way to reduce the execution time or update to code I could make?

r2evans
  • 141,215
  • 6
  • 77
  • 149
Stiar
  • 45
  • 1
  • 8
  • 2
    Just my opinion, but that huge single heavily-nested `rbind`-from-heck is waaaaaaaay too big and hard to read. But some pointers: if (1) `questions` is a `data.frame` with 1 or more rows; and (2) each column is "simple" meaning a vector and not some `tidy`-ish list of compound elements; then you will never have a `NULL` in there, so you should be able to remove the vast majority of conditional code. – r2evans Sep 18 '18 at 23:27
  • 2
    Second, don't iteratively add to a frame, it performs *horribly*. Read https://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames/24376207#24376207, then try something like `qs <- lapply(df$userid, stack_users, "questions", num_pages=1000000, pagesize=100, filter="withbody"))` then `do.call(rbind.data.frame, qs)`. – r2evans Sep 18 '18 at 23:30
  • @r2evans for (1) yes it is a dataframe with one or more rows – Stiar Sep 19 '18 at 09:27
  • 1
    Also, you need to confirm what packages you are using. There are multiple `stackr`s on github, but I found one that has `stack_users`, is it https://github.com/dgrtwo/stackr? Please make this question reproducible by listing all non-base R packages and including a sample of the data used, perhaps `dput(head(df))`. Refs: https://stackoverflow.com/questions/5963269/, https://stackoverflow.com/help/mcve, and https://stackoverflow.com/tags/r/info. – r2evans Sep 19 '18 at 15:59
  • Have you tried replacing all of this code with `stack_users(df$userid, "questions", ...)`? Or even `do.call(rbind.data.frame, lapply(split(df$userid, floor((seq_along(df$userid)-1)/3)), stack_users, "questions", ...))`. – r2evans Sep 19 '18 at 16:08
  • @r2evans there a possibility one user id does not have info. So I receive an error and that's why I use NA – Stiar Sep 19 '18 at 22:50

1 Answers1

2

Partial answer since I'm not fluent in r:

Are you trying to get the list of questions for a given set of users?

If so,

for (j in 1:nrow(df)) {
    questions <- stack_users(df$userid[j]... 

is a poor way to do it.

Refer to the API's /users/{ids}/questions doc:

(The) {ids} (parameter) can contain up to 100 semicolon delimited ids. To find ids programmatically look for user_id on user or shallow_user objects.

(Emphasis added)

So instead of something that evaluates to stack_users(1,... (one id)

Group the ids in batches of 100 for that function. Something like:

stack_users(c(1,2,3,4,5,...),...

(But remember I'm not an r coder.)

Brock Adams
  • 90,639
  • 22
  • 233
  • 295
  • The approach is correct, and the [source code for the function](https://github.com/dgrtwo/stackr/blob/master/R/stack_users.R#L6) confirms that it accepts one or more, but that does not resemble valid R code. Your could should read something like `stack_users(c(1,2,3,4,5,...),...)`, where the `c(...)` encapsulates its contents into a vector that is collectively passed as the first argument. And R uses commas, not semi-colons. – r2evans Sep 19 '18 at 16:02
  • 1
    Thanks, @r2evans. incorporated into answer. I don't yet know the correct syntax for `c()`, but the API requires semicolons. I guess the library handles that? – Brock Adams Sep 19 '18 at 17:48
  • @BrockAdams thank you I tried this but it it possible some users don't have questions. So there is any content and possible I receive this error: `Error in stack_parse(req) : Error 404: no method found with this name` possible with a try catch I could prevent it ? – Stiar Sep 19 '18 at 22:49
  • @Stiar, open a new question for that if you need to, but it doesn't make sense. The API doesn't 404 if the user has no questions. ... Here's [a user with no questions](https://stackoverflow.com/users/52/). ... Here [the API gives valid, but empty results](https://api.stackexchange.com/2.2/users/52/questions?order=desc&sort=activity&site=stackoverflow), just like it should. There is no 404 – Brock Adams Sep 19 '18 at 23:05