1

I have CSV file with ~30 columns, one of the columns is a json string. What I want to do is to read the csv and breakdown the json to rows (explode).

for example: CSV:

"data1,date1,{"USERS-1":"ff", "name1":"Joe1", "age":"1"},1" 
"data2,date2,{"USERS-2":"ff", "name2":"Joe2", "age":"2"},2" 
"data3,date3,{"USERS-3":"ff", "name3":"Joe3", "age":"3"},3" 

Result after:

"data1,date1,"USERS-1","ff",1"
"data1,date1,"name1","Joe1",1"
"data1,date1,"age","1",1"
"data2,date2,"USERS-2","ff",2"
"data2,date2,"name2","Joe1",2"
"data2,date2,"age","2",2"
"data3,date3,"USERS-3","ff",3"
"data3,date3,"name3","Joe1",3"
"data3,date3,"age","3",3"

I'm not writing in scala.

The Json is unstructured!

Joe
  • 11
  • 2
  • Possible duplicate of [How to query JSON data column using Spark DataFrames?](https://stackoverflow.com/questions/34069282/how-to-query-json-data-column-using-spark-dataframes) – 10465355 Nov 28 '18 at 18:08
  • Attach code that you tried with the question. – Sailesh Kotha Nov 28 '18 at 21:05
  • I had answered a similar question: Check this one: https://stackoverflow.com/a/46738678/3389828 – Nikhil Dec 01 '18 at 08:41

1 Answers1

2

Joe! I wrote a class that in order to show you how I would approach your problem. Following the code I will give you extra details in order for you to better understand what the code does.

public class MMM {

public static void main(String[] args) {
    String s = "data1,date1,{\"USERS-1\":\"ff\", \"name1\":\"Joe1\", \"age\":\"1\"},1";
    processLine(s);
}

public static void processLine(String s) {
    final String dates = s.split("[{]")[0];
    final String content = s.split("[{]")[1];
    final List<String> elements = Arrays.stream(content.split("[,}]")).map(String::trim).filter(x -> !x.isEmpty())
            .collect(Collectors.toList());
    String result = dates;
    for (int i = 0; i < elements.size() - 1; i++) {
        result += elements.get(i);
        result += elements.get(elements.size() - 1);
        System.out.println(result);
        result = dates;
    }
}
}

Basically, what the code does is to split a line read from the CSV into 2 parts, the dates and the contents found between the brackets. The contents are split again, trimmed in order to remove " " found at the ends of the strings and the the empty strings are filtered out. We now have a list of the elements concerning us. For a better visualisation of what the method does I decided to print the result. You can easily modify the code in order to have them returned in a list or whatever you might like. I hope my answer was helpful, have a nice day!

  • Thanks for your response but the question was related to Spark application rather than Java question. – Joe Nov 30 '18 at 12:30
  • The solution is to create a class implementing the Function> interface. Add the same functionality as the one provided by the method written by me in the call() method which you must overwrite. After you have you class you can use it in a map function called on the data structure that you use. After applying the class/function on your data structure all that remains is to explode or flatMap your data and you will obtain the desired output. If you are using JavaRDD : yourRDD.map(new YourFunction()).flatMap(x -> x.iterator()). – Bîrsan Octav Dec 04 '18 at 17:35