A follow-up to "Extracting data from an API using R"

Question

The code I have (which comes from here A continuation of... Extracting data from an API using R) gives a very complicated output. I can extract almost all I need except for a data.frame that's nested within the list.

Without doing anything, it gives me this error:

Error in .rowNamesDF<-(x, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': ‘1’, ‘10’, ‘11’, ‘12’, ‘13’, ‘14’, ‘15’, ‘16’, ‘17’, ‘18’, ‘19’, ‘2’, ‘20’, ‘3’, ‘4’, ‘5’, ‘6’, ‘7’, ‘8’, ‘9’

If I try to flatten or unlist, it comes up NULL.

In the example code, I've added some variables that are easy to get and number 42 is "dokintressent", from which I need "intressent", a list of names for each case. I have to run APIs from the Swedish legislative a half a dozen times, but this is the trickier one.

When I remove 42, it makes the data.frame perfectly.

my_dfs1 <- lapply(1:207, function(i){
  my_url <- paste0("http://data.riksdagen.se/dokumentlista/?sok=&doktyp=mot&rm=&from=2017-01-01&tom=2017-12-31&ts=&bet=&tempbet=&nr=&org=&iid=&webbtv=&talare=&exakt=&planering=&sort=rel&sortorder=desc&rapport=&utformat=json&a=s&p=", i)
  r1 <- GET(my_url)
  r2 <- rawToChar(r1$content)
  r3 <- fromJSON(r2)
  r4 <- r3$dokumentlista$dokument
  return(r4)
})

df <- my_dfs1 %>% lapply(function(df_0){
  df_0[c(12:14, 18, 42)]
}) %>% do.call(rbind, .)

EDIT: I've noticed that the data I want is actually several data.frames per case. From "intressent", I need "namn". Basically, I need the final database to look like this:

                     V12     V13    V14    V18    Namn
    Motion 1                                     c(name1, name2)

The thing is only you can decide how the data should be represented, or rather how you wanted to be. `intressent` has a data frame per row, how does this data frame fit the original data frame (a.k.a `df_0` in you the code above)? how do you imagine the end result you want to get? Do you want, for example, to repeat each row from `df_0[c(12:14, 18])` for the rows in `intressent`? — DS_UNI, Feb 12 '19 at 19:42
I'm gonna edit the question, since I've realized the data.frame per row issue as well. What I **really** need is "namn", which would be great if it could come as a column with each cell reflecting a list of names. I need all of it to be one single database that I can work with later. Thank you so much for doing this. I definitely bit off a bit more than I can chew. — Larissa, Feb 12 '19 at 21:07

score 0 · Accepted Answer · answered Feb 12 '19 at 21:46

0

you need to work on intressent on its own and extract from it what you need and then assign it to a new column, just make sure you get a simple data structure per row.

You can also, if it works better for you, paste the names together, separated by '-', for example, and then intressent will be a simple character vector.

df <- my_dfs1 %>% lapply(function(df_0){
  #choose the columns you want
  return_df <- df_0[c(12:14, 18)]
  # work on intressent
  return_df$namn <- df_0$dokintressent$intressent %>% 
    lapply(function(x)list(x$namn)) %>% 
    do.call(rbind, .)                    # careful here a simple unlist won't work
  return(return_df) }) %>% 
  do.call(rbind, .)

answered Feb 12 '19 at 21:46

DS_UNI

2,600
2
11
22

I thought about that - that it should be worked on its own - but I didn't know where to begin. Sorry to ask as I'm sure you've considered this, but in the end they'll be assigned back to the correct cases? Who works with whom is one of my main hypothesis. – Larissa Feb 12 '19 at 21:55
I ran it a couple of times and it comes back as a list, using all the would-be columns - maybe I'm missing something – Larissa Feb 12 '19 at 22:51
I'm not sure what you mean by that, for example what do you get if you run `df$namn[1] %>% str()`? – DS_UNI Feb 13 '19 at 08:25
I don't know what I didn't wrong before, but it's working now. It wasn't coming back as a data.frame but a large matrix. I'm so sorry have kept bothering you and I would love to give you proper credit if you'll let me (beyond thanking you by using your handle in my acknowledgments). – Larissa Feb 13 '19 at 10:02
That' perfectly fine! And it wasn't a bother at all, I'm glad to help – DS_UNI Feb 13 '19 at 10:15
and before I forget, please accept the answer if it solved your problem :) – DS_UNI Feb 13 '19 at 10:18
Oh, yeah, of course! Done and done. If you wanna send me your name in private, please do. – Larissa Feb 13 '19 at 10:58
If you're still up for it, @DS_UNI... One last question (I promise!): for this API (data.riksdagen.se/dokumentlista/…), I haven't been able to do the same and get "talare" in the place of "intressent", although the format seems to be same. The error message is this: Error in df_0$debatt$anforande : $ operator is invalid for atomic vectors Called from: eval(lhs, parent, parent). Again, thank you! – Larissa Feb 14 '19 at 11:16
not sure where is this `talare` is, but the error message states that you're trying to get `df_0$debatt$anforande` , however `df_0$debatt` is a vector which means you can not use `$ ` with it – DS_UNI Feb 14 '19 at 18:00
This particular variable for this API comes after `dokintressent` does (since in this case the latter is null). So, it would be dokumentlista>dokument>debatt>anforande>talare. `talare` is a variable within a data.frame; anforade is a data.frame list. That's why it struck me as odd that it didn't work as it did before. In fact, it's also a list of names, it just means "speaker". Thanks for coming back! – Larissa Feb 14 '19 at 21:15
unfortunately I couldn't find `dokumentlista` at all, try to make the problem more specific. As a way to understand better what is happening, choose one page and work with it and take it one step at a time, for example take a look at the column you want to work with from one element of `my_dfs1`, if it's a list take a look at the first element from this list and so on, and of-course it's much easier to help you if I can see what you tried. – DS_UNI Feb 15 '19 at 08:19
Right, sorry. `dokumentlista` is flattened in the first piece of code. my_dfs3 <- lapply(1:302, function(i){ my_url <- paste0("http://data.riksdagen.se/dokumentlista/?sok=&doktyp=bet&rm=&from=2000-01-01&tom=2017-12-31&ts=&bet=&tempbet=&nr=&org=&iid=&webbtv=&talare=&exakt=&planering=&sort=rel&sortorder=desc&rapport=&utformat=json&a=s&p=", i) r1 <- GET(my_url) r2 <- rawToChar(r1$content) r3 <- fromJSON(r2) r4 <- r3$dokumentlista$dokument return(r4) }) Afte that, I added the 2nd code (TBC) – Larissa Feb 15 '19 at 09:51
I simply changed what I was looking for, since `debatt` here seems to be in the same format as `dokintressent` df <- my_dfs3 %>% lapply(function(df_0){ return_df <- df_0[c(12:14, 18)] return_df$talare <- df_0$debatt$anforande %>% lapply(function(x)list(x$talare)) %>% do.call(rbind, .) return(return_df) }) %>% do.call(rbind, .) It then gave the atomic vector error. I tried a bunch of things, but I'm still too new at this. I managed to clunkinly get the names for 2017, but that was it. – Larissa Feb 15 '19 at 09:59
but it's not the same format, at least not when I'm getting the data, have you looked at `my_dfs1[[1]]$debatt`? – DS_UNI Feb 15 '19 at 10:04
Just did it. It prints out 20 data chunks, but I know that there are more. – Larissa Feb 15 '19 at 11:48
I don't know what happened, but I decided to retry the second piece of code (search for `talare`) and it worked this time! Is R alive? I'm so so sorry to have bothered you so much and for something that I can't even explain. I don't even know how to thank you. – Larissa Feb 15 '19 at 11:50

A follow-up to "Extracting data from an API using R"

1 Answers1

Linked