0

So I am doing an analysis of tweets from different accounts using get_timeline from rtweet. It returns a df with 90 variables, which is great. However, one of them, the variable hashtags, gives me either NA (no hashtags used in the tweet, one hashtag or a list of all the hashtags. So, I want to create different variables for each of the hashtags in order to save the tweets into a CSV to use powerBI and do some graphs. Thefore, my question is can you split all the elements of the list into different variables containing a single word each?

  • Hello Pablo, welcome to SO. Could you please elaborate on the desired outcome? I think, from your description there could be two possible solutions and we do not know which one of the it is. Could you also add a small reproducible sample of your data we can use to show you how it is done? You get that by using `dput(head(df[, select a subset of columns]`. – Jan Jan 06 '21 at 11:32
  • Are you looking for [this](https://stackoverflow.com/questions/50881440/split-a-list-column-into-multiple-columns-in-r/50881721)? – Rui Barradas Jan 06 '21 at 12:55
  • I was looking for that, thanks @RuiBarradas and everyone else that commented!! – Pablo García Naveira Jan 08 '21 at 10:07

1 Answers1

0

As I understand your problem you do not need to split the list in order to get all single or unique list entries, but use a combination of unlist and unique instead.

Let's assume you have a list of hashtags (just letters in the example) with different lengths, l_hashtags . Some hashtags are repetitions.

unlisting the list will give you vector with all hashtags, including all repetitions.

applying unique to this unlisted l_hastag gives you the unique members of the original list.

l_hashtags <- list(c(LETTERS[1:2]), rep(NA,5), LETTERS[5:15], c('A', 'N', 'N', 'J', 'K'))
l_hashtags
#> [[1]]
#> [1] "A" "B"
#> 
#> [[2]]
#> [1] NA NA NA NA NA
#> 
#> [[3]]
#>  [1] "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O"
#> 
#> [[4]]
#> [1] "A" "N" "N" "J" "K"

table(unlist(l_hashtags))
#> 
#> A B E F G H I J K L M N O 
#> 2 1 1 1 1 1 1 2 2 1 1 3 1

l_hashtags_unlisted <- unlist(l_hashtags)

unique(l_hashtags_unlisted)
#>  [1] "A" "B" NA  "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O"

You can of course put all this into one single line:

unique(unlist(l_hashtags))
# [1] "A" "B" NA  "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O"
MarBlo
  • 4,195
  • 1
  • 13
  • 27