4

I have a set of numbers (NDC - drug numbers) that have a - in them. I am trying to read the file, remove the - and write the numbers to a new file. Any help with this would be appreciated. Using Py 2.7

1. 68817-0134-50
2. 68817-0134-50
3. 68817-0134-50

The issue is that the hyphen is not always in the same position.

1. 8290-033010

It changes and can be in any position

with open('c:\NDCHypen.txt', 'r') as infile,
     open('c:\NDCOnly.txt', 'w') as outfile:
    replace("-", "")
Shaji
  • 741
  • 2
  • 9
  • 23

2 Answers2

17
with open(r'c:\NDCHypen.txt', 'r') as infile, \
     open(r'c:\NDCOnly.txt', 'w') as outfile:
    data = infile.read()
    data = data.replace("-", "")
    outfile.write(data)

To prevent the conversion of line endings (e.g. between '\r\n' and \n'), open both files in binary mode: pass 'rb' or 'wb' as the 2nd arg of open.

pts
  • 80,836
  • 20
  • 110
  • 183
  • is there any way to rewrite the initial file that is opened? for example, how do you remove text or a row from `'c:\NDCHyphen.txt'`?? – oldboy Aug 19 '19 at 00:40
  • @BugWhisperer: Yes, there are multiple options, some of them are atomic. Please ask a separate question for that. – pts Aug 19 '19 at 16:16
12

You can do it easily with a shell script which would be must faster than a python implementation. Unless you have something more with the script, you should go with the shell script version.

However, with Python it would be:

with open('c:\NDCHypen.txt', 'r') as infile, open('c:\NDCOnly.txt', 'w') as outfile:
    temp = infile.read().replace("-", "")
    outfile.write(temp)
neeagl
  • 348
  • 1
  • 13
  • Not sure who and why the negative; but this works like a charm too. Thank you! – Shaji Oct 05 '13 at 21:12
  • I think it's hard to predict (and it's also system-dependent) whether the shell script or Python implementation is faster. For large files both of them would be I/O-bound, so the wall time difference should be negligible. The Python implementation uses more memory (it keeps 2 copies of the input in memory), which is wasteful for large files. – pts Aug 19 '19 at 16:19