Bash: concatenated variables derived from text file using grep gives confused output

Question

In my directory, I have a multiple nifti files (e.g., WIP944_mp2rage-0.75iso_TR5.nii) from my MRI scanner accompanied by text files (e.g., WIP944_mp2rage-0.75iso_TR5_info.txt) containing information on the acquisition parameters (e.g., "Series description: WIP944_mp2rage-0.75iso_TR5_INV1_PHS_ND"). Based on these parameters (e.g., INV1_PHS_ND), I need to change the nifti file name, which are echoed in $niftibase. I used grep to do this. When echoing all variables individually, it gives me what I want, but when I try to concatenate them into one filename, the variables are mixed together, instead of delimited by a dot.

I tried multiple forms of sed to cut away potentially invisible characters and identified the source of the problems: the "INV1_PHS_ND" part of 'series description' gives me troubles, which is the $struct component, potentially due to the fact that this part varies in how many fields are extracted. Sometimes this is 3 (in the case of INV1_PHS_ND), but it can be 2 as well (INV1_ND). When I introduce this variable into the filename, everything goes haywire.

for infofile in ${PWD}/*.txt; do

  # General characteristics of subjects (i.e., date of session, group number, and subject number)
  reco=$(grep -A0 "Series description:" ${infofile} | cut -d ' ' -f 3 | cut -d '_' -f 1)
  date=$(grep -A0 "Series date:" ${infofile} | cut -c 16-21)
  group=$(grep -A0 "Subject:" ${infofile} | cut -d '^' -f 2 | cut -d '_' -f 1 )
  number=$(grep -A0 "Subject:" ${infofile} | cut -d '^' -f 2 | cut -d '_' -f 2)
  ScanNr=$(grep -A0 "Series number:" ${infofile} | cut -d ' ' -f 3)


  # Change name if reco has structural prefix
  if [[ $reco = *WIP944* ]]; then

    struct=$(grep -A0 "Series description: WIP944" ${infofile} | cut -d '_' -f 4,5,6)
    niftibase=$(basename $infofile _info.txt).nii

    #echo ${subStudy}.struct.${date}.${group}.${protocol}.${paradigm}.nii
    echo ${subStudy}.struct.${struct}.${date}.${group}.${protocol}${number}.${paradigm}.n${ScanNr}.nii

    #mv ${niftibase} ${subStudy}.struct.${struct}.${date}.${group}.${protocol}${number}.${paradigm}.n${ScanNr}.nii

  fi

done

This gives me output like this:

.niit47.n4lot.Noc002
.niit47.n5lot.Noc002D
.niit47.n6lot.Noc002
.niit47.n8lot.Noc002
.niit47.n9lot.Noc002
.niit47.n10ot.Noc002
.niit47.n11ot.Noc002D

for all 7 WIP944 files. However, it needs to be in the direction of this: H1.struct.INV2_PHS_ND.190523.Pilot.Noc001.Heat47.n11.nii, where H1, Noc, and Heat47 are loaded in from a setup file.

EDIT: I tried to use awk in the following way:

  reco=$(awk 'FNR==8 {print;exit}' $infofile | cut -d ' ' -f 3 | cut -d '_' -f 1)
  date=$(awk 'FNR==2 {print;exit}' $infofile | cut -c 15-21)
  group=$(awk 'FNR==6 {print;exit}' $infofile | cut -d '^' -f 2 | cut -d '_' -f 1 )
  number=$(awk 'FNR==6 {print;exit}' $infofile | cut -d '^' -f 2 | cut -d '_' -f 2)
  ScanNr=$(awk 'FNR==14 {print;exit}' $infofile | cut -d ' ' -f 3)

which again gave me the correct output when echoing the variables individually, but not when I tried to combine them: .niit47.n11022_PHS_ND.

I used echo "$struct" | tr -dc '[:print:]' | od -c to see if there were hidden characters due to line endings, which resulted in:

0000000    I   N   V   2   _   P   H   S   _   N   D
0000013

EDIT: This is how the text file looks like:

Series UID: 1.3.12.2.1107.5.2.34.18923.2019052316005066316714852.0.0.0
Study date: 20190523
Study time: 153529.718000
Series date: 20190523
Series time: 160111.750000
Subject: MDC-0153,pilot_003^pilot_003
Subject birth date: 19970226
Series description: WIP944_mp2rage-0.75iso_TR5_INV1_PHS_ND
Image type: ORIGINAL\PRIMARY\P\ND
Manufacturer: SIEMENS
Model name: Investigational_Device_7T
Software version: syngo MR B17
Study id: 1
Series number: 5
Repetition time (ms): 5000
Echo time[1] (ms): 2.51
Inversion time (ms): 900
Flip angle: 7
Number of averages: 1
Slice thickness (mm): 0.75
Slice spacing (mm): 
Image columns: 320
Image rows: 320
Phase encoding direction: ROW
Voxel size x (mm): 0.75
Voxel size y (mm): 0.75
Number of volumes: 1
Number of slices: 240
Number of files: 240
Number of frames: 0
Slice duration (ms) : 0
Orientation: sag
PixelBandwidth: 248

I have one of these for each nifti file. subStudy is hardcoded in a setup file, which is loaded in prior to running the for loop. When I echo this, it shows the correct value. I need to change the names of multiple files with a specific prefix, which are stored in $reco.

Looks like you have Windows line endings (CR LF), causing the text to appear overwritten. — muru, Jun 04 '19 at 08:29
This looks very much like you should try to learn the basics of Awk. — tripleee, Jun 04 '19 at 08:38
While I do agree with learning the basics of Awk (never used it before), my system uses unix LF line endings. — Jurjen Heij, Jun 04 '19 at 08:45
Because of `*.txt;` you should escape the filename like `"${infofile}"` or `"$infofile"`. And I vote for awk too. — Wiimm, Jun 04 '19 at 09:28
You remove all non-printable characters with `tr -dc '[:print:]'` to look for non-printable characters in the output... nice one. ;-) Try `echo "$struct" | hexdump -C`. — cbley, Jun 04 '19 at 11:58
bash-3.2$ echo "$struct" | hexdump -C 00000000 49 4e 56 32 5f 50 48 53 5f 4e 44 0d 0a |INV2_PHS_ND..| 0000000d — Jurjen Heij, Jun 04 '19 at 12:03

tripleee · Answer 1 · 2019-06-04T12:30:34.700

As confirmed in comments, the input files have DOS carriage returns, which are basically invalid in Unix files. Also, you should pay attention to proper quoting.

As a general overhaul, I would recommend replacing the entire Bash script with a simple Awk script, which is both simpler and more idiomatic.

for infofile in ./*.txt; do  # no need to use $(PWD)
   # Pre-filter with a simple grep
   grep -q '^Series description: [^ _]*WIP944' "$infofile" && continue
   # Still here? Means we want to rename
   suffix="$(awk -F : '
     BEGIN { split("Series description:Series date:Subject:Series number", f, /:/) }
     { sub(/\r/, ""); } # get rid of pesky DOS carriage return
     NR == 1 { nifbase = FILENAME; sub(/_info\.txt$/, ".nii", nifbase) }
     $1 in f { x[$1] = substring($0, length($1)+2) }
     END {
       split(x["Series description"], t, /_/); struct=t[4] "_" t[5] "_" t[6]
       split(x["Series description"], t, /_/); reco = t[1]
       date=substr(x["Series date"], 16, 5)
       split(x["Subject"], t, /\^/); split(t[2], tt, /_/); group=tt[1]
       number=tt[2]
       ScanNr=x["Series number"]
       ### FIXME: protocol and paradigm are still undefined
       print struct "." date "." group "." protocol number "." paradigm ".n" ScanNr
     }' "$infofile")"
  echo mv "$infofile" "$subStudy.struct.$suffix"
done

This probably still requires some tweaking (at least "protocol" and "paradigm" are still undefined). Once it seems to print the correct values, you can remove the echo before mv and have it actually rename files for you.

(Probably still better test on a copy of your real data files first!)

I've added the content of a text file in the OP. I have a text file for each scan, in which the variables I want to store in the name are the aspects of the text file that separate the nifti files. — Jurjen Heij, Jun 04 '19 at 12:08
See update now. I trust you will be able to finish the missing pieces on your own with a little bit of googling. — tripleee, Jun 04 '19 at 12:31

Bash: concatenated variables derived from text file using grep gives confused output

1 Answers1