0

I am not very experienced in R. I have a data in son format on a series of tweets. When I print them they look like this:

print(result)
#$in_reply_to_status_id
#[1] 1.002615e+18
#
#$possibly_sensitive
#[1] FALSE
#
#$created_at
#[1] "Thu Jun 20 10:54:04 CEST 2019"
#
#$truncated
#[1] TRUE
#
#$source
#[1] "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>"
#
#$retweet_count
#[1] "0"

etc etc.

Now I would like to extract only the list of the dates of these tweets to export them in csv format, but when I try:

tweetdate=lapply(result, function(x) x$getCreated())

I get:

Error in x$getCreated : $ operator is invalid for atomic vectors

How can I solve this?

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
SOF
  • 3
  • 3
  • Possible [duplicate](https://stackoverflow.com/questions/2617600/importing-data-from-a-json-file-into-r/2617823). See also CRAN package [rjson](https://cran.r-project.org/web/packages/rjson/). – Rui Barradas Nov 23 '19 at 12:57

1 Answers1

0

without the .json file and the code to reproduce the error I'm not pretty sure if this would work but it could worth a try.

I suppose that result must be a list of lists with one item per tweet and several columns or items for each of the tweets with the fields you have named above (in_reply_to_status_id,created_at, etc etc.). However, when you do:

tweetdate=lapply(result, function(x) x$getCreated())

You are calling a function named getCreated() as if each item in the result object has an inner method to return the date in which the tweet was created (as if each item of result is an instance of a hypothetical tweet class and created_at is an attribute of it).

However, your result object is only a list with several attributes, so I think that you only has to call its created_at item:

tweetdate <- result$created_at

If result is a list of lists, you can obtain a vector of dates as:

tweetdates <- sapply(result, function(x){x$created_at})

Hope it helps!

Read more: I suppose that you are used to program in object-oriented programming languages, however R is not specially designed to that (you have S3 and S4 classes if you want to specially try object oriented programming in R)

EDIT 1: This edit is just to clarify the answer. Your current result object has the next format:

> result <- list("in_reply_to_status_id" = 1.00256e+18, "possible_sensitive" = FALSE, 
"created_at" = "Thu Jun 20 10:54:04 CEST 2019", "truncated" = TRUE, 
"source" = "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>", 
"retweet_count" = 0)
> result
$in_reply_to_status_id
[1] 1.00256e+18

$possible_sensitive
[1] FALSE

$created_at
[1] "Thu Jun 20 10:54:04 CEST 2019"

$truncated
[1] TRUE

$source
[1] "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>"

$retweet_count
[1] 0

To obtain the data of that tweet, you can obtain it using the first solution I posted:

> tweetdate <- result$created_at
> tweetdate
[1] "Thu Jun 20 10:54:04 CEST 2019"

Now, supposing you change how you read the data of reading the tweets from the .json file and you have a result2 object that is a listof lists where each sublistis analogous to your current result object:

> result2 <- list("tweet1" = result, "tweet2" = result)
> result2
$tweet1
$tweet1$in_reply_to_status_id
[1] 1.00256e+18

$tweet1$possible_sensitive
[1] FALSE

$tweet1$created_at
[1] "Thu Jun 20 10:54:04 CEST 2019"

$tweet1$truncated
[1] TRUE

$tweet1$source
[1] "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>"

$tweet1$retweet_count
[1] 0


$tweet2
$tweet2$in_reply_to_status_id
[1] 1.00256e+18

$tweet2$possible_sensitive
[1] FALSE

$tweet2$created_at
[1] "Thu Jun 20 10:54:04 CEST 2019"

$tweet2$truncated
[1] TRUE

$tweet2$source
[1] "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>"

$tweet2$retweet_count
[1] 0

In this case, the created_at does not exist, because that information is stored in each sublist. To obtain the information of that field, you have to iterate over each sublist extracting the created_at field. The second code I posted do exactly that using the sapply function (see more about the xapply functions in this link):

> tweetdate2 <- sapply(result2, function(x){x$created_at})
> tweetdate2
                         tweet1                          tweet2 
"Thu Jun 20 10:54:04 CEST 2019" "Thu Jun 20 10:54:04 CEST 2019" 

There is a third case that I didn't contemplate in my answer that is when several tweets are stored in a simple list:

result3 <- list("in_reply_to_status_id" = c(1.00256e+18,1.00256e+18), 
"possible_sensitive" = c(FALSE, FALSE), 
"created_at" = c("Thu Jun 20 10:54:04 CEST 2019","Thu Jun 20 10:54:04 CEST 2019"), 
"truncated" = c(TRUE, TRUE), 
"source" = c("<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>","<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>"), 
"retweet_count" = c(0,0))

This case is equivalent to the first one when you only have a simple list, so you can access the created_at field directly using:

tweetdate3 <- result3$created_at
> tweetdate3
[1] "Thu Jun 20 10:54:04 CEST 2019" "Thu Jun 20 10:54:04 CEST 2019"

I didn't post the explanation before because it's maybe too long but would help to understand how R works.

EDIT 2: The link with the xapply explanation was in spanish, I change it for an english source.

JaiPizGon
  • 476
  • 2
  • 8
  • Thank you very much! The: tweetdate <- result$created_at worked but when I tried: tweetdates <- sapply(result, function(x){x$created_at}) I still got the same error: Error in x$created_at : $ operator is invalid for atomic vectors – SOF Nov 24 '19 at 17:48
  • Yes, that is because your `result` object is a simple `list` that refers to a **unique tweet**, so you must use `tweetdate <- result$created_at` (i.e. access to the `created_at` field of your `list`). The `sapply` code is in case that you have a `list` of `lists` in which each sublist refers to a tweet and have the same fields as your current `result` object. The `sapply` function would iterate over each sublist and return the `created_at` field of each sublist, hence returning the creation date of each tweet. I wrote that code only if you change your way of storing each tweet. – JaiPizGon Nov 25 '19 at 10:43
  • See the edit in the answer for more information, but to sum up in your current case you should use `tweetdate <- result$created_at` – JaiPizGon Nov 25 '19 at 11:12