The following code locates unwanted quotation marks ('
and "
) between each record or tab, and replaces it with nothing
.
It then replaces the tab (\t
) with a comma (,
).
This script uses regex
to locate the unwanted quotation marks.
import re
# Use regex to locate unwanted quotation marks
pattern = re.compile(r"(?!^|\"$)[\"\']")
new_file = open("C:\\Data\\log1.csv", "a")
# Read the file
with open("path\\logs.txt", "r") as f:
for line in f.readlines():
new_l = ""
for l in line.split('\t'):
# Replace the unwanted quotation marks
l = re.sub(pattern, "", l)
if new_l == "":
new_l = new_l + l
else:
new_l = new_l + ',' + l
# Write the line to the new file
new_file.write(new_l)
new_file.close()
The reason you are seeing the issue that you are seeing, is that you have an unwanted quotation mark within the record. For example:
"The"\t"quick brown"" fox "jumps over the"\t"lazy dog"