2

i have a huge file and as an output some columns doesn't have a value, i need to fill these columns with 0 for further analysis. I can separate the columns with space or tab, now below it is seen separated with tab.

alt text

berkay
  • 3,907
  • 4
  • 36
  • 51
  • Just a question, have you examined the last blank column in an editor? Is the column data (not the delimeter) an actual \t ? –  Jan 11 '11 at 22:38

5 Answers5

6

This is really a job for a CSV parser, but if it has to be a regex, and you never have tabs within quoted CSV entries, you could search for

(^|\t)(?=\t|$)

and replace with

$10

So, in Perl:

(ResultString = $subject) =~ 
s/(    # Match either...
   ^   # the start of the line (preferably)
   |   # or
   \t  # a tab character
  )    # remember the match in backreference no. 1
  (?=  # Then assert that the next character is either
   \t  # a(nother) tab character
   |   # or
   $   # the end of the line
  )    # End of lookahead assertion
/${1}0/xg;

This will transform

1   2       4           7   8
    2   3       5   6   7   

into

1   2   0   4   0   0   7   8   
0   2   3   0   5   6   7   0
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • "...if it has to be a regex..." - the original poster never mentioned using a regex? – plusplus Jan 11 '11 at 15:39
  • 1
    @mfontani: Oops. Of course... - @plusplus: Yes, he did, it's the first tag he used. – Tim Pietzcker Jan 11 '11 at 16:52
  • @Tim Pietzcker, perl -p -e "s/(^|\t)(?=\t|$)/${1}0/xg" file , does not seem to work, any idea? zero's are just put next to the column members – berkay Jan 11 '11 at 20:15
  • I'm sorry, I don't know Perl (I got the code snippet from RegexBuddy which translated my regex into Perl code), so I can't tell what might have gone wrong. It does expect the file to be tab-separated. Can you post an excerpt of your file in your question (instead of a screenshot)? One or two lines should be enough. – Tim Pietzcker Jan 11 '11 at 21:14
  • @Tim Pietzcker, perl -p -e "s/(^|\t)(?=\t|$)/${1}\t0/g" works fine. – berkay Jan 11 '11 at 21:47
  • This is strange since the `${1}` should have replaced the tab already. But it's great if you got it working anyway. – Tim Pietzcker Jan 11 '11 at 22:03
6

For a tab-separated file, this AWK snippet does the trick:

BEGIN { FS = "\t"; OFS="\t" }
{
    for(i = 1; i <= NF; i++) {
         if(!$i) { $i = 0 }
    }
    print $0
}
arnsholt
  • 851
  • 7
  • 17
  • Will this also work if the empty column is at the start or at the end of a line, i. e. not between two tab characters? – Tim Pietzcker Jan 11 '11 at 11:45
  • Yes. I've tested it on a small (3 column) file, with a fully populated row, one with a missing value in the middle, one with a missing value at the start and one with the missing at the end, and it handles all four lines correctly. – arnsholt Jan 11 '11 at 13:01
  • 1
    The `$0` is superfluous: `print` is sufficient. – glenn jackman Jan 11 '11 at 15:25
  • 1
    That's true. But I prefer to be explicit when the difference in verbosity is as small as this. – arnsholt Jan 11 '11 at 15:47
1

Deleting my answer after re-reading the original post. There are no tabs as data, just delimeters. If there is no data, a double delimeter will apear to align the columns.
It can't be any other way. So if a single delimeter is there, it will separate two empty fields. "" = 1 empty field, "\t" = 2 empty fields. I got it now.

Tim Pietzcker has the correct answer all along. +1 for him.
It could be written alternatively as s/ (?:^|(?<=\t)) (?=\t|$) /0/xg;, but its the same thing.

  • The output is given by a tool and it has tab and comma option.what's the function of -i.txt here? Also my file column number can change later depending output – berkay Jan 11 '11 at 19:08
  • @berkay - c:\> perl -h will give you all the switches. -i[extension] is in-place editing, where [extension] is added to your input filename (here its 'd.txt') as a filename for a "backup" of your input file (which is being modified). You can always run this as a Perl program and pass in the filename and number. In that case the regex would be `s!(?:^|(?<=\t))\t!$number\t!g`. Or you can do 1 liner as a batch parameter. Not sure what you are doing. –  Jan 11 '11 at 20:02
  • thanks but i'm having problem with the regex. s should be followed by '/, s/// is this windows style? – berkay Jan 11 '11 at 20:14
  • @berkay on unix you should use the single quote instead. This is new regex, if nothing works for you: `perl -pe 's/(?:^|(?<=\t))(?:(?=\t$)\t|(\t))/0$1/g' d.txt` –  Jan 11 '11 at 22:30
  • now it works but also check the accepted answer, it's more understandable. Thanks sln. +1 – berkay Jan 11 '11 at 23:01
1

Here's a sed solution. Note that some versions of sed don't like \t.

sed 's/^\t/0\t/;:a;s/\t\t/\t0\t/g;ta;s/\t$/\t0/' inputfile

or

sed -e 's/^\t/0\t/' -e ':a' -e 's/\t\t/\t0\t/g' -e 'ta' -e 's/\t$/\t0/' inputfile

Explanation:

s/^\t/0\t/    # insert a zero before a tab that begins a line
:a            # top of the loop
    s/\t\t/\t0\t/g    # insert a zero between a pair of tabs
ta            # if a substitution was made, branch to the top of the loop
s/\t$/\t0/    # insert a zero after a tab that ends a line
Dennis Williamson
  • 346,391
  • 90
  • 374
  • 439
  • To clarify, a loop is needed if you have cases like `a\t\t\tb`. Regular expression matches never overlap. So if the loop is not available, `a\t\t\tb` will change to `a\t0\t\tb` and not `a\t0\t0\tb` – kvantour Jul 09 '19 at 13:14
0

If and only if your data only contains numbers and you have clear defined field separator FS, you can use the following trick:

awk 'BEGIN{FS=OFS="\t"}{for(i=1;i<=NF;++i) $i+=0}1' file

By adding zero, we convert strings to numbers. Empty strings will be converted to the number zero. You can define your field separator to anything you like.

This, however, might be a bit slow since it will reparse $0 and split it into fields, every time you reassign a field $i.

A faster way is the solution of Dennis Williamson

kvantour
  • 25,269
  • 4
  • 47
  • 72