0

I have multiple JSON files containing Tweets from Twitter. I want to import and edit them in R one by one.

For a single file my code looks like this:

data <- fromJSON("filename.json")
data <- data[c(1:3,13,14)]
data$lang <- ifelse(data$lang!="de",NA,data$lang)
data <- na.omit(data)
write_as_csv(data,"filename.csv") 

Now I want to apply this code to multiple files. I found a "for" loop code here:

Loop in R to read many files

Applied to my problem it should look something like this:

setwd("~/Documents/Elections")
ldf <- list()
listjson <- dir(pattern = "*.json")
for (k in 1:length(listjson)){
  data[k] <- fromJSON(listjson[k])
  data[k] <- data[k][c(1:3,13,14)]
  data[k]$lang <- ifelse(data[k]$lang!="de",NA,data[k]$lang)
  data[k] <- na.omit(data[k])
  filename <- paste(k, ".csv")
  write_as_csv(listjson[k],filename) 
}

But the first line in the loop already doesn't work.

> data[k] <- fromJSON(listjson[k])
Warning message:
In `[<-.data.frame`(`*tmp*`, k, value = list(createdAt =  c(1505935036000,  :
  provided 35 variables to replace 1 variables

I can't figure out why. Also, I wonder if there is a nicer way to realize this problem without using a for loop. I read about the apply family, I just don't know how to apply it to my problem. Thanks in advance!

This is an example how my data looks: https://drive.google.com/file/d/19cRS6p_mHbO6XXprfvc6NPZWuf_zG7jr/view?usp=sharing

Fabian
  • 13
  • 1
  • 5
  • @jogo I posted an outdated version of my code. I changed the variable, see update above. It wasn't the problem. – Fabian Feb 21 '18 at 15:02
  • In your second block you didn't define `data` so it's not clear what is `data[k]`. The way it looks, you could define `data = list()` outside of the loop and then add the dataframe in position `k`. Inside the loop should be like `tmp=fromJSON(listjson[k])`, do your manipulation to `tmp` and then `data[k]=tmp`. You end up with a list of dataframes. – Duccio A Feb 21 '18 at 15:16

1 Answers1

0

It should work like this:

setwd("~/Documents/Elections")
listjson <- dir(pattern = "*.json")
for (k in 1:length(listjson)){
   # Load the JSON that correspond to the k element in your list of files
   data <- fromJSON(listjson[k]) 
   # Select relevant columns from the dataframe
   data <- data[,c(1:3,13,14)]
   # Manipulate data
   data$lang <- ifelse(data$lang!="de",NA,data$lang)
   data <- na.omit(data)

   filename <- paste(listjson[k], ".csv")
   write_as_csv(data,filename) 
}

For the second part of the question, apply applies a function over rows or columns of a dataframe. This is not your case, as you are looping through a vector of character to get filenames to be used somewhere else.

Duccio A
  • 1,402
  • 13
  • 27
  • 1
    Thank you! Your comment above with the tmp variable led me to the same result. It works :) Okay, so there is no alternative to make this more efficient? – Fabian Feb 21 '18 at 15:52
  • @Fabian What is slowing you down here is not the loop, I suppose. It's the fact that it takes a while to access and read the files. I don't think you can improve much on that side. – Duccio A Feb 21 '18 at 15:58
  • 1
    @Fabian No problem, if you find the answer useful and responding to your question, you might want to accept it as an answer (button just below the arrows of the up/down votes). It helps other to know that it is the right answer (and, let's face it, me that I am trying to build a reputation :) ) – Duccio A Feb 21 '18 at 19:42