1

I am trying to use AWS DataBrew to clean and normalize incoming data which is dropped into a S3 bucket. However, most of the data that I receive from clients comes in .txt format, which DataBrew does not accept as an input format.

Hence, I am looking for a way to automatically convert the incoming .txt files into .csv files so that DataBrew can work with these files. My initial thought is that AWS Lambda might be best suited for this job.

I can imagine that I am not the first person to come across this issue and hence would greatly appreciate any assistance that can be given. If someone has developed a Lambda function that can achieve this, I would greatly appreciate you sharing your code.

Alternatively, if there is a simpler way to work with .txt files in AWS DataBrew, I would welcome any insight that can be provided.

learner
  • 33
  • 7
  • Write a Java based Lambda function that takes the TXT files and converts them to CSV. For example - see https://stackoverflow.com/questions/22526679/parse-txt-to-csv. This will provide the solution that you are seeking. – smac2020 Jan 06 '21 at 20:51

1 Answers1

1

First of all, there is no work around for AWS DataBrew to .txt file formats. As we can see the input formats of AWS doc which is recognizes by DataBrew.

Yes you can convert the .File to CSV through the lambda which is as follow.

const fs = require('fs');

csv()
.from(fs.createReadStream('./test.txt'))
.to(fs.createWriteStream('./file.csv'))

For the working into this you can checkout this doc export-to-csv. Which will help you out to cover any kind of data present in .txt file.

Abdul Moeez
  • 1,331
  • 2
  • 13
  • 31