0

I have an Excel file that looks like the following:

First_Name  Initials    Last_Name   Places  Email   Tel Fax Joint   Corresponding   Experimental design Data generation Data processing Data analysis   Statistical analysis    Manuscript preparation
Anna    A   Karenina    BioInform_Harvard   anna.Karenina@ucsf.edu  8885006000  8885006001  1       Y   Y   Y   Y   Y   Y
Konstantin  D   Levin   Neuro_Harvard   Konstantin.levin@childrens.harvard.edu  8887006000  8887006001  1               Y   Y   Y   
Alexei  K   Vronsky IGM_Columbia    alexei.vronsky@cumc.columbia.edu    8889006000  8889006001  2           Y               
Stepan  A   Oblonsky    NIMH    steoblon@mail.nih.gov   8891006000  8891006001  2       Y                   Y

In my Python code, to open the file i have written code as follows:

with open(filename, 'r') as f:
    for i in f:
        i = i.rstrip().split("\t")
        print(i)

The resulting list looks as follows. How do I get rid of the '\r'? I've tried various methods like replacing "\r" with "", but that messes up the elements of the list that look like 'Y\rKonstantin'.

['First_Name', 'Initials', 'Last_Name', 'Places', 'Email', 'Tel', 'Fax', 'Joint', 'Corresponding', 'Experimental design', 'Data generation', 'Data processing', 'Data analysis', 'Statistical analysis', 'Manuscript preparation\rAnna', 'A', 'Karenina', 'BioInform_Harvard', 'anna.Karenina@ucsf.edu', '8885006000', '8885006001', '1', '', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y\rKonstantin', 'D', 'Levin', 'Neuro_Harvard', 'Konstantin.levin@childrens.harvard.edu', '8887006000', '8887006001', '1', '', '', '', 'Y', 'Y', 'Y', '\rAlexei', 'K', 'Vronsky', 'IGM_Columbia', 'alexei.vronsky@cumc.columbia.edu', '8889006000', '8889006001', '2', '', '', 'Y', '', '', '', '\rStepan']

I'm able to get rid of newline characters fine, but it's the '\r' I can't get rid of.

Sagar P. Ghagare
  • 542
  • 2
  • 12
  • 25
claudiadast
  • 591
  • 3
  • 11
  • 33
  • why `rstrip` and not `strip`?try executing `'\rAlexei'.strip()`. On the other hand, you can use map to convert all the string in list – mad_ Jan 17 '19 at 18:14
  • You should use `'\n\r'` – Xion Jan 17 '19 at 18:17
  • `i.rstrip("\n\r")`. – CristiFati Jan 17 '19 at 18:19
  • Also `f.read().striplines()` can give you want you need – mad_ Jan 17 '19 at 18:20
  • Possible duplicate of [How to strip newlines from each line during a file read?](https://stackoverflow.com/questions/18865210/how-to-strip-newlines-from-each-line-during-a-file-read) – mad_ Jan 17 '19 at 18:30
  • https://stackoverflow.com/questions/24946640/removing-r-n-from-a-python-list-after-importing-with-readlines – mad_ Jan 17 '19 at 18:30
  • I've tried the above solutions and those in similar questions, but nothing is working. Things look fine when I do some variation of "strip" or "rstrip" first, but then when I split by tab, the "\r" is introduced again. I tried first splitting by tab and then doing strip too but that doesn't work either. – claudiadast Jan 17 '19 at 18:59
  • 2
    This looks like a TSV so why don't you just use the built-in [`csv`](https://docs.python.org/3/library/csv.html) module and let it do the proper parsing for you? – zwer Jan 17 '19 at 19:07
  • What version of python are you using? – glibdud Jan 17 '19 at 19:29

2 Answers2

1

as suggested, the csv module is good for dealing with this sort of data. I'd do something like:

import csv

with open(filename) as fd:
  inp = csv.reader(fd, delimiter='\t')

  header = next(inp)
  print(header)

  for row in inp:
    print(row)

Python has support for magic universal newlines which means it does something sensible with "old-style" Mac line-endings by default. your can then use the csv module with a custom delimiter to parse the tab delimited file

Sam Mason
  • 15,216
  • 1
  • 41
  • 60
1

The key thing to notice is that python only reads one big line with all the \r characters embedded within. Based on that, I'm guessing you're using Python 2.x, which didn't enable universal newlines mode by default. Changing your mode to rU should do what you're expecting:

with open(filename, 'rU') as f:
    for i in f:
        i = i.rstrip().split("\t")
        print(i)

For more information, see the open() documentation.

glibdud
  • 7,550
  • 4
  • 27
  • 37