0

I have a directory with many files in it and want to edit each file to only contain a select few columns.

I have the following code which will only print the first column

for i in /directory_path/*.txt; do awk -F "\t" '{ print $1 }' "$i"; done

but if I try to edit each file by adding >'$I' as below then I lose all the information in my files

for i in /directory_path/*.txt; do awk -F "\t" '{ print $1 }' "$i" > "$i"; done

However I want to be able to remove all but a select few columns in each file for example 1 and 3.

dawg
  • 98,345
  • 23
  • 131
  • 206

2 Answers2

5

Given:

cat file
1 2 3
4 5 6

You can do in place editing with sed:

sed -i.bak -E 's/^([^[:space:]]*).*/\1/' file 

cat file
1
4

If you want freedom to work with multiple columns and have in place editing, use GNU awk that also supports in place editing:

gawk -i inplace '{print $1, $3}' file

cat file 
1 3
4 6

If you only have POSIX awk or wanted to use cut you generally do this:

  1. Modify the file with awk, cut, sed, etc
  2. Redirect the output to a temp file
  3. Rename the temp file back to the original file name.

Like so:

awk '{print $1, $3}' file >tmp_file; mv tmp_file file

Or with cut:

cut -d ' ' -f 1,3 file >tmp_file; mv tmp_file file

To do a loop on files in a directory, you would do:

for fn in /directory_path/*.txt; do
    awk -F '\t' '{ print $1 }' "$fn" >tmp_file 
    mv tmp_file "$fn"
done    
dawg
  • 98,345
  • 23
  • 131
  • 206
0

Just to add a little more to @dawg's perfectly well working answer according to my use case.

I was dealing with CSVs, and standard CSV can have , in some values as long as it's in double quotes like for example, the below-mentioned row will be a valid CSV row.

col1,col2,col2

1,abc,"abc, inc"

But the command above was treating the , between the double quotes as delimiter too.
Also, the output file delimiter wasn't specified in the command.

These are the modifications I had to make for it handle the above two problems:

for fn in /home/ubuntu/dir/*.csv; do
    awk -F ',' '{ FPAT = "([^,]*)|(\"[^\"]+\")"; OFS=","; print $1,$2 }' "$fn" >tmp_file 
    mv tmp_file "$fn"
done

The OSF delimiter will be the diameter of the output/result file.
The FPAT handles the case of , between quotation mark.

The regex and the information for that is mentioned ins awk's official documentation in section 4.7 Defining Fields by Content.

I was led to that solution through this answer.

saadi
  • 646
  • 6
  • 29