0

Example line:

"1","Point 1","value1,value2,value3","value1: funtion1**\n**value2: function2","UUID=12345","description"

I want to extract column 1, column 2 and column 5.

  • I cant use comma, since col3 has multiple comma.
  • Column 4 contains multiple new line\n hence can't reach column 5(UUID).

We can achieve this by opening a csv file with excel, just need to select field delimiter and text delimiter. Is it possible with scripting?

You can download file with this link.

JChetan
  • 13
  • 3
  • Can you post your CSV? And the Example line is that a list or a string object? – Rakesh Mar 31 '18 at 13:10
  • See here: https://stackoverflow.com/questions/7804673/escaping-separator-within-double-quotes-in-awk – Zhiya Mar 31 '18 at 13:13
  • Cant you do it with python [csv](https://docs.python.org/2/library/csv.html) module? – urban Mar 31 '18 at 13:21
  • I checked previous and mentioned articles. Few `uniq` things about this problem: multiple comma, multiple new lines and text delimiter as well which are not present in any of the article. To understand the problem clearly, please check mentioned link and open a csv file in excel. Its a one line. – JChetan Mar 31 '18 at 13:27

3 Answers3

0

If the example line is a string. the below snippet should help.

s = """"1","Point 1","value1,value2,value3","value1: funtion1**\n**value2: function2","UUID=12345","description"
I """

val = [i.strip('"') for i in s.split('","')]
print(val)
print(val[4])     #use index to get element

Output:

['1', 'Point 1', 'value1,value2,value3', 'value1: funtion1**\n**value2: function2', 'UUID=12345', 'description"\nI ']
UUID=12345
Rakesh
  • 81,458
  • 17
  • 76
  • 113
0

NOTE: this solution is simple, but highly dependent on the format being consistent. Assuming there is no spacing in between the quotation marks and commas, you could separate them by ",":

#input string
s = '"1","Point 1","value1,value2,value3","value1: funtion1**\n**value2: function2","UUID=12345","description"' 

# remove quotation marks and whitespace from edges, then split string into tuple
cols = s.strip('" ').split('","') 

Now you can extract your columns, eg 5th column is cols[4]:

>> cols[4]
'UUID=12345'     
M.T
  • 4,917
  • 4
  • 33
  • 52
0

If I understand correctly what you wanna do, using the following python:

import csv
with open('/tmp/test.csv', 'r') as csvfile:
     reader = csv.reader(csvfile, delimiter=',', quotechar='"')
     for row in reader:
         for i, cell in enumerate(row):
            print("Cell %d: %s" % (i, cell))

you can get each column regardless on if they contain the delimiter or new lines. Output:

Cell 0: 1
Cell 1: Point 1
Cell 2: value1,value2,value3
Cell 3: value1: funtion1**\n**value2: function2
Cell 4: UUID=12345
Cell 5: description

Now, I opened your linked xlsx and saved as csv with LibreOffice, the code handles the new lines:

Cell 0: 1
Cell 1: point1
Cell 2: value1, value2,value3
Cell 3: Line1

Line2.

Line3.
Cell 4: UUID=123545
time=123seconds

Start time: x

End time: y
Cell 5: 1234

The raw csv contents were (note that although they represent a single row they do actually span multiple lines in the file):

1,point1,"value1, value2,value3","Line1

Line2.

Line3.","UUID=123545
time=123seconds

Start time: x

End time: y",1234

Let me know if the above snippet can handle your data but if they are indeed CSV format, it should be able to read the cells properly

urban
  • 5,392
  • 3
  • 19
  • 45