0

I have a sample extract in the form of json lines that contains single object and around 100 rows. There are about 800 items per row.

Here is a sample of the data:

Row 1 - {"Id":"User1","OwnerId":"OwnerID1","IsDeleted":false,"Name":"SampleName1", etc...}

Row 2 - {"Id":"User2","OwnerId":"OwnerID2","IsDeleted":true,"Name":"SampleName2", etc...}

I want to turn this into a dataframe looking like this:

Id     | OwnerId     |  IsDeleted | Name         |  etc..
User1  | OwnerID1    |  false     | SampleName1  |  etc..
User2  | OwnerID2    |  true      | SampleName2  |  etc..

I did some experimenting with dplyr and tidyr but nothing worked out.

Any suggestions what would be the optimal way to handle this?

I was able to resolve this by first fixing the formatting of the data via parsing it through a JSON validator. Once I got the the data to a "proper" JSON format it was quite straight forward to consume it in R as a data frame.

I've used jsonLite as it was suggested by other users and all went well.

install.packages("jsonlite")
library(jsonlite)
KafkaDF <- fromJSON("Kafka_Formatted_Full.JSON")

Due to the data structure, there was a need for a transformation to a matrix.

KafkaDFM = as.matrix(KafkaDF)

And then another transformation so that it could be exported to a csv with proper column and row alignment.

KDF2 <- apply(KafkaDFM, 2, as.character)
write.csv(KDF2,"C:\\Data\\KafkaCompleteClean.csv", row.names = TRUE)

  • @cricket_007 Unfortunately I don't have access to Kafka. – Metodi Simeonov Feb 27 '20 at 14:35
  • Oh I misunderstood the question then. I'm removing the tag – OneCricketeer Feb 27 '20 at 14:39
  • Why are you under the impression that you have CSV? http://jsonlines.org/examples/ – OneCricketeer Feb 27 '20 at 14:41
  • @cricket_007 You are right, this is JSON format stored into a csv. – Metodi Simeonov Feb 27 '20 at 14:51
  • 2
    Does this answer your question? [Importing data from a JSON file into R](https://stackoverflow.com/questions/2617600/importing-data-from-a-json-file-into-r). I highly recommend `jsonlite` over other libraries. – asachet Feb 27 '20 at 14:54
  • @cricket_007 I wasn't sure from the example whether this was actually JSON or just bore a superficial resemblance. I have deleted my answer. – Allan Cameron Feb 27 '20 at 14:54
  • Extensions are irrelevant to the file reader, by the way – OneCricketeer Feb 27 '20 at 15:32
  • @asac Thank you for this reference. I am exploring this right now but it appears that my data is encoded in wrong format. Importing it with jsonlite trows an error - "unexpected character ''S". I've used jsonlint to check the data and there are parsing errors. – Metodi Simeonov Feb 28 '20 at 10:23
  • @Allan your approach was really interesting and I wonder whether if I spend more time with strsplit to define proper delimiters that would yield better result. I am running out of options and my next move will be to ask for access to the Kafka environment or try to get an export in JSON with correct format and encoding. – Metodi Simeonov Feb 28 '20 at 10:23

0 Answers0