1

As part of my thesis, I am analyzing political parties' polarity. After receiving a datadump with Facebook messages in JSON, I parsed it into R. Unfortunately, one list-variable is nested:

I need to extract the $sentiment$polarity$score out of the list within list within list.

Observations: 63,465
Variables: 5
$ description <chr> "'TEXT'" ...
$ parties     <list> ["X", "X", "Y", ...
$ date        <date> 2018-03-05, 2018-03-05...
$ title       <chr> NA, NA...
$ sentiment   <list> [[[0.2998967, "Positief"], ...

Using glimpse(df$sentiment) shows:

 $ :List of 2
  ..$ polarity    :List of 2
  .. ..$ score      : num 0.15
  .. ..$ description: chr "Neutraal"
  ..$ subjectivity:List of 2
  .. ..$ score      : num 0.65
  .. ..$ description: chr "Erg subjectief"
  [list output truncated]

EDIT: head(df$sentiment, n=1) gives:

[[1]]
[[1]]$`polarity`
[[1]]$`polarity`$`score`
[1] 0.2998967

[[1]]$`polarity`$description
[1] "Positief"

[[1]]$subjectivity
[[1]]$subjectivity$`score`
[1] 0.5458678

[[1]]$subjectivity$description
[1] "Subjectief"

But, the problematic part of df$sentiment exists in (when running head(df$sentiment, n=10) ) is as follows:

[[5]]
named list()

Thus, the observation does contain an empty list, instead of the format of containing two other lists.

I have tried the following:

df %>% unnest(sentiment, .drop = FALSE, .sep = '"')

Unfortunately, this doubled my df thereby losing the distinction between polarity$score and sentiment$score.

Also, I tried

matrix(unlist(df$sentiment),ncol=4,byrow=TRUE)

Unfortunately, this cannot cope with the NULL entries (i.e. when $sentiment is empty while $polarity is not empty). Thus, it creates a flawed matrix.

I have also played around with the flatten, unlist and tranpose functions, but that did not seem to get me anywhere. I am not that experienced in R, therefore I hoped someone could assist me to extract the right score and enter it as an column to my dataframe. I hope I provided all the needed information.

Zaletio
  • 11
  • 2
  • Could you supply a longer example of `sentiment`. Is every `sentiment$polarity$score` have length two? Your code `matrix(unlist(df$sentiment), ncol = 4, byrow = TRUE)` seems like a good start and maybe try creating an `if(length(df$sentiment) == 0){ rep(NA, 4)}`. This could catch the entries that are `NULL` and fill it with `NA` so errors aren't produced. – Rex Feb 09 '19 at 17:22
  • (1) Every observation of `sentiment$polarity$score` contains a numeric amount between +1 and -1. (2) Longer example of sentiment through `head(df$sentiment)` is provided in the post above. (3) I'll run your try and update in a minute! Thanks for the suggestion. – Zaletio Feb 09 '19 at 17:35
  • I tried your suggestion @Rex, thanks! Unfortunately, this did not work (as the entries are not "NULL" but merely an empty list). I could try to use `flatten`, but this abandons the differentation between `$polarity$score` and `$sentiment$score`. – Zaletio Feb 09 '19 at 17:46

2 Answers2

0

The first bit of code is me creating an example. I made the value NULL by setting score = c() to see if it solves your issue. I did have to do it with a for loop but it should work. The second bit is how you would code it using your data frame and list values. It basically does an interim check to test for NULL lists.

##construction of example data frame
a <- list(polarity = list(score = c(), description = "positief"))
b <- list(subjectivity = list(score = 2, description = "subjectief"))
c <- list(empty_list = list())
d <- list(c(a, b, c))

##my d is equivalent to your df
d[[1]][[1]][[1]]
length(d)
sent.pol.score <- double(length(d))
for ( i in 1 : length(d) ) {
    if ( length(d[[1]][[1]][[1]]) == 1 ) {
        sent.pol.score[i] <- d[[1]][[1]][[1]]
    }
}


##this should work with your data frame
sent.pol.score <- double(length(df$sentiment))
for ( i in 1 : length(df$sentiment) ) {
    if ( length(df$sentiment$polarity$score) == 1 ) {
        sent.pol.score[i] <- df$sentiment$polarity$score
    }
}

Note that sent.pol.score will be the length of the data set and will equal 0 if the value is NULL. I don't know what values these can take but you may want to change it to sent.pol.score <- rep(NA, length(df$sentiment)).

Rex
  • 96
  • 6
  • Thank you @Rex for your help and suggestions. Alas, it did not work, which forced me to explore the data even more. There, I discovered the existence of empty lists within `$sentiment`. Thus, I managed to achieve my goal in the following manner (see larger comment). However, I appreciate all your help! – Zaletio Feb 09 '19 at 20:18
0

After Rex's help, I discovered the existence of some empty lists (in the form of list() ) within $sentiment. This, in combination with Rex's suggestions, led me to the following solution:

#Remove empty lists from $sentiment
df.1 <- df %>% filter(sentiment != "list()")

#Unnest $sentiment list
df.2 <- df.1 %>% unnest(sentiment, .drop = FALSE, .sep = '"')

#Create function to remove even rows in df.2,  which contain $sentiment$subjectivity
Nth.delete <-function(dataframe, n)dataframe[-(seq(n,to=nrow(dataframe),by=n)),]

See: https://stackoverflow.com/questions/7942519/deleting-every-n-th-row-in-a-dataframe

#Execute Nth.delete function on every even rows of df, containing $sentiment$subjectivity
df.3 <- Nth.delete(df.1, 2)

#Unnest list $sentiment again to disctinct between $polarity$score and $polarity$description
df.4 <- df.3 %>% unnest(sentiment, .drop = FALSE, .sep = '"')

#Execute Nth.delete function again to remove the even rows containing $sentiment$polarity$description
df.5 <- Nth.delete(df.4, 2)

This created the df in which the $sentiment$polarity$score formes a coherent column in my df.

Zaletio
  • 11
  • 2