unix delete rows from multiple files using input from another file

Question

I have multiple (1086) files (.dat) and in each file I have 5 columns and 6384 lines. I have a single file named "info.txt" which contains 2 columns and 6883 lines. First column gives the line numbers (to delete in .dat files) and 2nd column gives a number.

etc... I need to read in info.txt, find every-line number corresponding to values less than 300 in 2nd column (so it is 2 and 3 in above example). Then I need to read these values into sed-awk or grep and delete these #lines from each .dat file. (So I will delete every 2nd and 3rd row of dat files in the above example).

More general form of the question would be (I suppose): How to read numbers as input from file, than assign them to the rows to be deleted from multiple files.

I am using bash but ksh help is also fine.

Please come up with a more minimal example, showing the input files and your desired output. — glenn jackman, Jun 04 '14 at 13:30
In your example, only the first row has a value greater than 300 in the 2nd column, so it looks like to me that you'd only delete line 1 from your data files, not lines 2 and 3. — SeeJayBee, Jun 04 '14 at 14:15
Can you edit it and clean that up then? Also, in your second sentence you say "2 rows and 6883 lines". I assume you actually mean "2 columns and 6883 lines". — SeeJayBee, Jun 04 '14 at 14:17

tripleee · Answer 1 · 2014-06-04T15:07:21.647

1

sed -i "$(awk '$2 < 300 { print $1 "d" }' info.txt)" *.dat

The Awk script creates a simple sed script to delete the selected lines; the script it run on all the *.dat files.

(If your sed lacks the -i option, you will need to write to a temporary file in a loop. On OSX and some *BSD you need -i "" with an empty argument.)

edited Jun 04 '14 at 15:07

answered Jun 04 '14 at 14:54

tripleee

175,061
34
275
318

I think there's a problem with this, but I'm not sure. After sed deletes line 1, won't line 2 now be line 1? So, if you delete line 2, aren't you now deleting line 3? I think you have to do a reverse sort of the line numbers to delete somewhere. – SeeJayBee Jun 04 '14 at 14:59
Also, you need to guarantee uniqueness. If you delete line 25, you don't want to delete line 25 again. – SeeJayBee Jun 04 '14 at 15:01
Why do you think so? Did you try? `sed` refers to line numbers in the original input file. – tripleee Jun 04 '14 at 15:02
Obviously, I didn't try, that's why I said "I'm not sure". But if it always refers to the original file, then there should be no problem. – SeeJayBee Jun 04 '14 at 15:02
Well to clarify if it works or not: This worked out GREAT!! I really appreciate it ! Thanks a lot tripleee! – user2721844 Jun 05 '14 at 11:56

score 1 · Answer 2 · answered Jun 05 '14 at 09:08

This might work for you (GNU sed):

sed -rn 's/^(\S+)\s*([1-9]|[1-9][0-9]|[12][0-9][0-9])$/\1d/p' info.txt | 
sed -i -f - *.dat

This builds a script of the lines to delete from the info.txt file and then applies it to the .dat files.

N.B. the regexp is for numbers ranging from 1 to 299 as per OP request.

score 0 · Answer 3 · answered Jun 04 '14 at 14:23

# create action list
cat info.txt | while read LineRef Index
 do
   if [ ${Index} -lt 300 ]
    then
      ActionReq="${ActionReq};${Index} b
"
    fi
 done

# apply action on files
for EachFile in ( YourListSelectionOf.dat )
 do
   sed -i -n -e "${ActionReq}
p" ${EachFile}
 done

(not tested, no linux here). Limitation with sed about your request about line having the seconf value bigger than 300. A awk is more efficient in this operation. I use sed in second loop to avoid reading/writing each file for every line to delete. I think that the second loop could be avoided with a list of file directly given to sed in place of file by file

The [useless use of `cat`](/questions/11710552/useless-use-of-cat) is an antipattern. The `for` loop is a syntax error. — tripleee, Jun 19 '19 at 12:23

score 0 · Answer 4 · answered Jun 05 '14 at 09:18

0

This should create a new dat files with oldname_new.dat but I havent tested:

awk 'FNR==NR{if($2<300)a[$1]=$1;next}
     !(FNR in a)
     {print >FILENAME"_new.dat"}' info.txt *.dat

answered Jun 05 '14 at 09:18

Vijay

65,327
90
227
319

unix delete rows from multiple files using input from another file

4 Answers4