1

input.txt is tab-delimited.

I know a simple code to replace.

import fileinput
for line in fileinput.FileInput("input.txt",inplace=1):
    line = line.replace("AA","0")
    print line,

However, I want to replace cells of only the 3rd column of input.txt (not the whole file input.txt), and I want to replace a cell by 0 if it is any one of AA or AAA or BB or BBB, replace a cell by 1 if it is not any one of them.

Here, I am talking about "Match entire cell contents"

By "Match entire cell contents" I mean that, it will be replaced only when a cell (such as (2,3)-element of input.txt) is exactly AA or AAA or BB or BBB. A cell such as "AAs" will not be replaced by anything.

On the contrary if "Match entire cell contents" is not applied, then it will be replaced whenever a cell merely "contains" AA or AAA or BB or BBB. So a cell "AAhaha" will be replaced by "0haha"

Anyhow, to repeat, I want to replace cells of only the 3rd column of input.txt (not the whole file input.txt), and I want to replace a cell by 0 if it is any one of AA or AAA or BB or BBB, replace a cell by 1 if it is not any one of them, in a "Match entire cell contents" way.

user1849133
  • 527
  • 1
  • 7
  • 18
  • @MartijnPieters: if it's a CSV file (well, TSV). I have sometimes encountered tab-delimited data that isn't TSV. – Steve Jessop Nov 01 '13 at 12:54
  • @MartijnPieters My input will be txt, tab-delimited, UTF8 without BOM. A txt file can be csv, too? Then how can I check if my input is csv? – user1849133 Nov 01 '13 at 13:04
  • @user2604484: CSV is a text format; it is any textual file that contains columns of data delimited by a delimiter, be that a comma, a pipe symbol, a tab or anything else. – Martijn Pieters Nov 01 '13 at 13:12
  • @user2604484: The `csv` module lets you read and write your format, simply by setting the delimiter to `\t`. – Martijn Pieters Nov 01 '13 at 13:12
  • Well, that's all there is to it if you set `csv.QUOTE_NONE` on the reader. Otherwise csv is not that simple. The questioner needs to find out what the intended meaning is of any `"` characters in the file, and parse the file accordingly. – Steve Jessop Nov 01 '13 at 13:13

2 Answers2

2
for line in fileinput.FileInput("input.txt",inplace=1):
    cells = line.split('\t')
    cells[2] = '0' if cells[2] in ('AA', 'AAA', 'BB', 'BBB') else '1'
    print '\t'.join(cells),

Beware, though, that I've taken a simplistic view of tab-delimited data. If your file makes use of the whole CSV/TSV format, with quoted cells containing tab characters and/or newlines, then you need csv, which is a proper CSV parser.

Conversely if you want a cell in column 0 containing for example "a" to be output as "a", then you must not use csv, because it will remove the quote marks when reading and not re-insert them on writing because they aren't needed for that cell.

So, first you must be sure how the file format is defined, then you can choose how to read and write it. Either way though, modifying it will be about the same.

One other niggle: I haven't done anything about the linebreak, so it will just sit in the last cell. Therefore, if the third cell is the last cell it will get removed when the cell is replaced by "0" or "1", which probably isn't what you want. And while we're talking about the number of cells, this code will of course throw an exception if any line has fewer than 3 cells. You should decide how you want to handle that, in particular it's not that uncommon to find a blank line at the end of a text file.

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
  • @Steve_Jessop "if the third cell is the last cell it will get removed when the cell is replaced by "0" or "1", which probably isn't what you want." Oh, the 3rd column is indeed likely to be the last column. What should I do then? – user1849133 Nov 01 '13 at 13:02
  • My input will be txt, tab-delimited, UTF8 without BOM. A txt file can be csv, too? Then how can I check if my input is csv? – user1849133 Nov 01 '13 at 13:03
  • @user2604484: "What should I do then?" -- probably best to take the linebreak off before splitting on `\t`, then put it back on when printing. – Steve Jessop Nov 01 '13 at 13:07
  • "how can I check if my input is csv?". You don't check whether it's CSV (noting that "tab-separated values" is a variant of CSV that uses a different delimiter instead of comma, so counts as CSV for these purposes). You need to agree with whoever supplies the file what format it will be in. Two identical files can have different meaning according to whether they are designated as TSV, or designated as simple tab-delimited data with one record per file of the file. – Steve Jessop Nov 01 '13 at 13:09
  • per line of the file, I mean. – Steve Jessop Nov 01 '13 at 13:16
  • I ran your program and it seems working. The replaced column was indeed the last column. But I am still concerned about what you wrote "What should I do then?" -- probably best to take the linebreak off before splitting on \t, then put it back on when printing." I am still concerned because I couldn't see how I should change the code to take your suggestion seriously. – user1849133 Nov 01 '13 at 13:39
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/40375/discussion-between-user2604484-and-steve-jessop) – user1849133 Nov 01 '13 at 13:41
  • @Steve_Jessop I just found that, although the output looks pretty and good, the input turns weird after running the program. – user1849133 Nov 02 '13 at 02:58
  • @user2604484: if you don't want to modify the input file, you should not have specified the `inplace` parameter to `FileInput`. – Steve Jessop Nov 02 '13 at 15:45
1

You should be using the csv module for this:

import csv
with open("input.txt", "rb") as infile, open("output.txt", "wb") as outfile:
    reader = csv.reader(infile, delimiter="\t")
    writer = csv.writer(outfile, delimiter="\t")
    for row in reader:
        row[2] = "0" if row[2] in ("AAA", "AA", "BBB", "BB") else "1"
        writer.writerow(row)
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • I ran your program, and it seems the content of input.txt is erased after I run your program. The output.txt seems correct though. So if your program can keep the input.txt as just as it was, then it will be perfect :) – user1849133 Nov 02 '13 at 03:31
  • @user2604484: I can't imagine why this would happen since I'm opening `input.txt` for reading only. Can you recheck? – Tim Pietzcker Nov 02 '13 at 08:21