13

I have a JSON file url present in S3 which I need to parse and extract information out of it. How do I do that in java?

I have looked into some of the solutions mainly in Python but not able to do that in Java.

I can read the content using

S3Object object = s3Client.getObject(new GetObjectRequest(bucketName, key));
InputStream objectData = object.getObjectContent();

but I do not want to download the file and keep it. I just need to be able to parse this JSON file using Gson.

How do I achieve this?

roger_that
  • 9,493
  • 18
  • 66
  • 102

4 Answers4

6

A bit late, but I'll leave this answer here in case someone else runs into this problem.

If you're not restricted to using Gson, then I'd recommend using Jackson's ObjectMapper instead.

Step 1: Add the Jackson dependency to your project.

// https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-databind
compile group: 'com.fasterxml.jackson.core', name: 'jackson-databind', version: '2.11.3'

Step 2: Create a Plain Old Java Object (POJO) that represents the JSON stream you want to parse. For example:

Class Item {
  
  public Item() { }

  private Integer id;
  private String name;
  ....
  // getters and setters

Step 3: Create an ObjectMapper instance and read the value from the JSON into an instance of your POJO class.

ObjectMapper objectMapper = new ObjectMapper();
S3Object s3Object = amazonS3.getObject(new GetObjectRequest(bucketName, key));
Item item = objectMapper.readValue(s3Object.getObjectContent(), Item.class);
Naz
  • 121
  • 1
  • 3
5

(Just expanding the comments given above.)

Following the approach in S3ObjectWrapper, we can have a method like this:

private static String getAsString(InputStream is) throws IOException {
    if (is == null)
        return "";
    StringBuilder sb = new StringBuilder();
    try {
        BufferedReader reader = new BufferedReader(
                new InputStreamReader(is, StringUtils.UTF8));
        String line;
        while ((line = reader.readLine()) != null) {
            sb.append(line);
        }
    } finally {
        is.close();
    }
    return sb.toString();
}

Then call this method like:

S3Object o = s3.getObject(bucketName, key);
S3ObjectInputStream s3is = o.getObjectContent();
String str = getAsString(s3is);
arun
  • 10,685
  • 6
  • 59
  • 81
3

S3 is a blob store, it can't parse the file for you. If you want to parse the data AWS side you might be better off storing the file in DynamoDB, which understands json documents.

If that's not an option you are on the right lines. Just turn that input stream into a json file and then parse it in memory. There is no requirement to actually write the file to disk at any point. Unless its a huge file you should be able to do it in memory no problem.

F_SO_K
  • 13,640
  • 5
  • 54
  • 83
  • If its a not a very large file, you could use the InputStreamReader to read the data in memory and convert it to valid Json. `try (BufferedReader reader = new BufferedReader(new InputStreamReader(s3Object.getObjectContent()))) { s3Data = reader.lines().collect(Collectors.joining("\n")); }` – Usman Azhar Jan 12 '18 at 12:58
  • is it a text file ? What type of file is it ? Are you able to print the lines using BufferReader ? – Usman Azhar Jan 16 '18 at 03:02
1
    AmazonS3 client = AmazonS3ClientBuilder.standard()
                       .withRegion(Regions.US_EAST_1.getName())
                       .build();
    Gson gson = new GsonBuilder().create();
    S3Object data = client.getObject("bucket_name", "file_path");
    try (S3ObjectInputStream s3is = data.getObjectContent()){
        File temporaryFile = new File("temporary_file.json");
        FileUtils.copyInputStreamToFile(s3is, temporaryFile);
        String jsonAsString = FileUtils.readFileToString(temporaryFile, UTF_8);
        YourClass obj = gson.fromJson(jsonAsString, YourClass.class);
    } catch (Exception e) {
            System.err.println(e.getMessage());
            System.exit(1);
   }

build.gradle

implementation group: 'com.amazonaws', name: 'aws-java-sdk-s3', version: '1.11.705'
implementation group: 'com.google.code.gson', name: 'gson', version: '2.8.6'
implementation group: 'commons-io', name: 'commons-io', version: '2.6'