I have a sample extract in the form of json lines that contains single object and around 100 rows. There are about 800 items per row.
Here is a sample of the data:
Row 1 - {"Id":"User1","OwnerId":"OwnerID1","IsDeleted":false,"Name":"SampleName1", etc...}
Row 2 - {"Id":"User2","OwnerId":"OwnerID2","IsDeleted":true,"Name":"SampleName2", etc...}
I want to turn this into a dataframe looking like this:
Id | OwnerId | IsDeleted | Name | etc..
User1 | OwnerID1 | false | SampleName1 | etc..
User2 | OwnerID2 | true | SampleName2 | etc..
I did some experimenting with dplyr and tidyr but nothing worked out.
Any suggestions what would be the optimal way to handle this?
I was able to resolve this by first fixing the formatting of the data via parsing it through a JSON validator. Once I got the the data to a "proper" JSON format it was quite straight forward to consume it in R as a data frame.
I've used jsonLite as it was suggested by other users and all went well.
install.packages("jsonlite")
library(jsonlite)
KafkaDF <- fromJSON("Kafka_Formatted_Full.JSON")
Due to the data structure, there was a need for a transformation to a matrix.
KafkaDFM = as.matrix(KafkaDF)
And then another transformation so that it could be exported to a csv with proper column and row alignment.
KDF2 <- apply(KafkaDFM, 2, as.character)
write.csv(KDF2,"C:\\Data\\KafkaCompleteClean.csv", row.names = TRUE)