-1

I have some questions about pandas dataframe reading csv file, and split column by comma. I have a csv file with only 1 column(with multiple rows, and I have to split column by comma.

assume the the dataframe is like "a", "b", 2021-05-01 00:00:00, "Tim, hortons" with 4 comma splited, and 1 commna in string

I wanted it to be splited like a, b, 2021-05-01 00:00:00, Tim hortons in 4 columns

but when I use df[0].str.split(',', expand=True), it be comes

a, b, 2021-05-01 00:00:00, Tim, hortons in 5 columns.

It did not ignore the comma between the double quote " ". Not all field are string type like the date, but all string are surround with ""

my code is like

In [1]: df = pd.DataFrame(["a", "b", 2021-05-01 00:00:00, "Tim, hortons"])

with sftp.open(a) as f:

df = pd.DataFrame(f)

df = df[0].str.split(',', expand=True)

someone please help!!! its been bugging me for a long time. Thank you !!!!

user3429999
  • 17
  • 1
  • 7
  • 2
    Hi user3429999, please read how to make a [good and reproducible pandas example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). It would help a lot if you could write the code that generates an example DataFrame with your issues, so that we can test quickly. – jfaccioni Aug 31 '21 at 14:06

3 Answers3

1

You could use regex to split the string by matching only comma not found between double quotes:

df[0].str.split(r',(?=(?:[^"]*"[^"]*")*[^"]*\Z)', expand=True)

Explanation of this regular expression can be found here: How to match something with regex that is not between two special characters?

0

try:

import re
str_ = '"a", "b", 2021-05-01 00:00:00, "Tim, hortons"'
re.findall(r'(\d+|".*")', str_)

Output:

['"a", "b", 2021-05-01 00:00:00, "Tim, hortons"']

Then use .replace(',',' ') for replacing the comma in the last column(name) with space.

Sonia Samipillai
  • 590
  • 5
  • 15
0
myStr = '"aaaa,a","bb,bb","ccc,dd","test,str"'
commaSplitedValue = re.split(r',(?=")', myStr)

print(commaSplitedValue)

Output

['"aaaa,a"', '"bb,bb"', '"ccc,dd"', '"test,str"']