I have a file containing multiple entries. Each entry is of the following form:
"field1","field2","field3","field4","field5"
All of the fields are guaranteed to not contain any quotes, however they can contain ,
. The problem is that field4
can be split across multiple lines. So an example file can look like:
"john","male US","done","Some sample text
across multiple lines. There
can be many lines of this","foo bar baz"
"jane","female UK","done","fields can have , in them","abc xyz"
I want to extract the fields using Python. If the field would not have been split across multiple lines this would have been simple: Extract string from between quotations. But I can't seem to find a simple way to do this in presence of multiline fields.
EDIT: There are actually five fields. Sorry about the confusion if any. The question has been edited to reflect this.