Handling data containing comma in a CSV file

Question

I have a CSV File that I am trying to read from Amazon S3 in Mosaic Decisions. This file contains an Address column in which the data itself contains a comma.

Example data in the file is shown below:

Address
sl,name,address
1,Ratan Kumar,FlatNo 122,Mumbai,Maharashtra

In this case, the address field is getting separated into 3 columns as- address, Missing_header_0, Missing_header_1 and the data is read as

sl,name,address,Missing_header_0, Missing_header_1
1,Ratan Kumar,FlatNo 122,Mumbai,Maharashtra

This corrupts the actual data and overrides the next column data How can we avoid this scenario?

Check the possibility of changing the delimiter to some special character Or before reading the file using Mosaic Decisions, first pre-process it using some script and replace `,` with other character — anuragal, Oct 05 '20 at 05:15
@anuragal one need to a pre-processing or replace the comma of data, in Mosaic Decisions, this is handled while the data is read just by configuring the reader node. Please refer to my answer for more clarity — codeogeek, Oct 05 '20 at 05:23

score 4 · Accepted Answer · answered Oct 05 '20 at 05:24

To avoid this scenario,

Open the Reader node configuration
Pass a single quote (') or double quote (") in the Quote text box available in the configuration tab

This feature of Mosaic Decisions allows wrapping the data in each field with quotes.

This would give the desired outcome.

score 2 · Answer 2 · answered Oct 05 '20 at 12:44

2

Fields containing a separator should be enclosed in double quotes:

sl,name,address
1,Ratan Kumar,"FlatNo 122,Mumbai,Maharashtra"

If you have no control over the creation of this file you could either contact the creator and ask to fix a malformed csv file, or write some custom code/script to parse the first 2 fields and treat the remainder of the line as the third field (if the address field is indeed the last field).

answered Oct 05 '20 at 12:44

Danny_ds

11,201
1
24
46

your answer is right, but I wanted to know how I could do this in Mosaic Decisions as I am using this for designing my data pipelines. – V Ruvesh Oct 05 '20 at 13:30
@VRuvesh Ah ok - I thought you were trying to _read_ the (pre-existing and malformed) data in Mosaic Decisions (or maybe you do, and the file already has those quoted fields :). – Danny_ds Oct 05 '20 at 13:51

Handling data containing comma in a CSV file

2 Answers2