4

I have the \x02\n as a line terminator in a csv file I'm trying to parse. However, I cannot use two characters in pandas, it only allows one, for example:

>>> data = pd.read_csv(file, sep="\x01", lineterminator="\x02")
>>> data.loc[100].tolist()
['\n1475226000146', '1464606', 'Juvenile', '1', 'http://itunes.apple.com/artist/juvenile/id1464606?uo=5', '1']

Or:

data = pd.read_csv(file, sep="\x01", lineterminator="\n")
 >>> data.loc[100].tolist()
['1475226000146', '1464606', 'Juvenile', '1', 'http://itunes.apple.com/artist/juvenile/id1464606?uo=5', '1\x02']

Here we can see that the \n hasn't been chopped off correctly. What would be the best way to read the csv file in pandas with the above separator?

David542
  • 104,438
  • 178
  • 489
  • 842

1 Answers1

5

As of v0.23, pandas does not support multi-character line-terminators. Your code currently returns:

s = "this\x01is\x01test\x02\nthis\x01is\x01test2\x02"
df = pd.read_csv(
    pd.compat.StringIO(s), sep="\x01", lineterminator="\x02", header=None)

df
        0   1      2
0    this  is   test
1  \nthis  is  test2

Your only option (as of now) is to remove the leading whitespace from the first column. You can do this with str.lstrip.

df.iloc[:, 0] = df.iloc[:, 0].str.lstrip()
# Alternatively,
# df.iloc[:, 0] = [s.lstrip() for s in df.iloc[:, 0]]

df

      0   1      2
0  this  is   test
1  this  is  test2

If you have to handle stripping of multiple other kinds of line-terminators (besides just the newline), you can pass a string of them:

line_terminators = ['\n', ...]
df.iloc[:, 0] = df.iloc[:, 0].str.lstrip(''.join(line_terminators))
cs95
  • 379,657
  • 97
  • 704
  • 746
  • thanks man. Any interest in a job in a job or internship (our company is in the LA area)? – David542 Dec 19 '18 at 05:20
  • instead of the blank `lstrip`, I used the characters in the line_terminator I had, so something like: `df.iloc[:, 0] = df.iloc[:, 0].str.lstrip(''.join(self.line_terminator[1:]))` – David542 Dec 19 '18 at 05:24
  • @David542 Are you David from Premier Digital? :-) I really appreciate your offer, but I already have an on-campus job for the coming spring, and will be joining Google once I graduate. I am not currently looking to interview for any positions, but if that changes I'll be sure to give you a ping! – cs95 Dec 19 '18 at 05:25
  • @David542 Okay, so you have a list of line-terminators? I can modify my solution. – cs95 Dec 19 '18 at 05:26