-1

I have a MetaData.csv file that contains many values to perform an analysis. All I want are: 1- Reading column names and making variables similar to column names. 2- Put values in each column into variables as an integer that can be read by other commands. column_name=Its_value

MetaData.csv:

MAF,HWE,Geno_Missing,Inds_Missing
0.05,1E-06,0.01,0.01

I wrote the following codes but it doesn't work well:

#!/bin/bash
Col_Names=$(head -n 1 MetaData.csv) # Cut header (camma sep)
Col_Names=$(echo ${Col_Names//,/ }) # Convert header to space sep
Col_Names=($Col_Names) # Convert header to an array 

for i in $(seq 1 ${#Col_Names[@]}); do
N="$(head -1 MetaData.csv | tr ',' '\n' | nl |grep -w 
"${Col_Names[$i]}" | tr -d " " | awk -F " " '{print $1}')";
${Col_Names[$i]}="$(cat MetaData.csv | cut -d"," -f$N | sed '1d')";
done

Output:

HWE=1E-06: command not found
Geno_Missing=0.01: command not found
Inds_Missing=0.01: command not found
cut: 2: No such file or directory
cut: 3: No such file or directory
cut: 4: No such file or directory
=: command not found

Expected output:

MAF=0.05
HWE=1E-06
Geno_Missing=0.01
Inds_Missing=0.01

Problems:

1- I want to use array length (${#Col_Names[@]}) as the final iteration which is 5, but the array index start from 0 (0-4). So MAF column was not captured by the loop. Loop also iterate twice (once 0-4 and again 2-4!). 2- When I tried to call values in variables (echo $MAF), they were empty!

Any solution is really appreciated.

  • 4
    Copy/paste your script into http://shellcheck.net and fix the issues it tells you about. Having said that - there doesn't seem to be anything in the shell script you posted that shouldn't instead all be handled in a single call to awk so if you want help doing whatever it is you're trying to do the right way then please post a [mcve] so we can help you. – Ed Morton Jan 03 '21 at 16:54
  • @Ed Morton, I used shellcheck.net but couldn't find the problem. I provided the data (MetaData.csv) right after the question, just copy and paste it to a file, please. Thanks in advance for your reply. – Mehdi Esmaeilifard Jan 04 '21 at 12:22
  • Assuming `MetaData.csv` is the sample input, you forgot to post the expected output. I didn't really expect shellcheck.net to completely solve your problem, just help you get your code to a point where we wouldn't be looking at code with obvious problems that shellcheck could detect so we could focuse on whatever is left. So, again, please run your code through shellcheck, fix the issues it tells you about, and then post **that** as the code in your question, not the current code with all of it's obvious issues. – Ed Morton Jan 04 '21 at 14:30
  • @ Ed Morton, I checked the codes again using shellcheck.net. It just did some minor modification that didn't change the output, such as: for i in `seq 1 ${#Col_Names[@]}`; >>> for i in $(seq 1 ${#Col_Names[@]}); Expected output: MAF=0.5 HWE=1E-06 Geno_Missing=0.01 Inds_Missing=0.01. However, based on the comments I'm considering another language to do the job. Thanks. – Mehdi Esmaeilifard Jan 05 '21 at 08:59
  • Please fix all of the issues shellcheck tells you about, not just some of them. In any case, if you post the expected output **in your question** then we can probably start trying to help you to solve your problem, as opposed to trying to help you ask your question. – Ed Morton Jan 05 '21 at 15:40
  • I edited the question. I also posted the edited codes. Fixing all issues make codes non-executable. Thanks – Mehdi Esmaeilifard Jan 06 '21 at 15:43
  • If the code is now non-executable then by definition you didn't fix it and there are clearly still multiple issues with the code in your question that shellcheck would tell you about, e.g. no shebang, unquoted variables, UUOC, use of ${var}=, etc. – Ed Morton Jan 06 '21 at 15:57

3 Answers3

2

This produces the expected output you posted from the sample input you posted:

$ awk -F, -v OFS='=' 'NR==1{split($0,hdr); next} {for (i=1;i<=NF;i++) print hdr[i], $i}' MetaData.csv
MAF=0.05
HWE=1E-06
Geno_Missing=0.01
Inds_Missing=0.01

If that's not all you need then edit your question to clarify your requirements.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • @ Dear Morton, I reproduce the output that you posted. It exactly what I want. But after executing the code when I call values (echo $MAF, etc), it doesn't return anything. Thanks. – Mehdi Esmaeilifard Jan 07 '21 at 06:43
  • `MAF` isn't a variable, it's part of the string `MAF=0.05` output by awk. Why are you trying to do `echo "$MAF"`? – Ed Morton Jan 07 '21 at 21:10
0

I don't really think you can implement a robust CSV reader/parser in Bash, but you can implement it to work to some extent with simple CSV files. For example, a very simply bash-implemented CSV might look like this:

#!/bin/bash

set -e

ROW_NUMBER='0'
HEADERS=()
while IFS=',' read -ra ROW; do
    if test "$ROW_NUMBER" == '0'; then
        for (( I = 0; I < ${#ROW[@]}; I++ )); do
            HEADERS["$I"]="${ROW[I]}"
        done
    else
        declare -A DATA_ROW_MAP
        for (( I = 0; I < ${#ROW[@]}; I++ )); do
            DATA_ROW_MAP[${HEADERS["$I"]}]="${ROW[I]}"
        done
# DEMO {
        echo -e "${DATA_ROW_MAP['Fnames']}\t${DATA_ROW_MAP['Inds_Missing']}"
# } DEMO
        unset DATA_ROW_MAP
    fi
    ROW_NUMBER=$((ROW_NUMBER + 1))
done

Note that is has multiple disadvantages:

  • it only works with ,-separated fields (truly "C"SV);
  • it cannot handle multiline records;
  • it cannot handle field escapes;
  • it considers the first row always represents a header row.

This is why many commands may produce and consume \0-delimited data just because this control character may be easier to use. Now what I'm not sure about is whether test is the only external command executed by bash (I believe it is, but it can be probably re-implemented using case so that no external test is executed?).

Example of use (with the demo output):

./read-csv.sh < MetaData.csv
19.vcf.gz    0.01
20.vcf.gz
21.vcf.gz
22.vcf.gz

I wouldn't recommend using this parser at all, but would recommend using a more CSV-oriented tool (Python would probably be the easiest choice to use; + or if your favorite language, as you mentioned, is R, then probably this is another option for you: Run R script from command line ).

  • Dear Fluffy, Thanks for your time and effort, I did the job with different codes, but it's not automated as much as I'd like to. I will try to solve the problems. About using Python or R, the question is, how can I input values as the variables into bash without writing Python/R's outputs and reading again into bash? Thanks – Mehdi Esmaeilifard Jan 04 '21 at 17:07
  • 1
    Please see [why-is-using-a-shell-loop-to-process-text-considered-bad-practice](https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice) for some of the reasons why this should be done with an awk script instead of a shell read loop if manadatory Unix tools are required. – Ed Morton Jan 04 '21 at 17:59
0

If I'm understanding your requirements correctly, would you please try something like:

#!/bin/bash

nr=1                                    # initialize input line number to 1
while IFS=, read -r -a ary; do          # split the line on "," then assign "ary" to the fields
    if (( nr == 1 )); then              # handle the header line
        col_names=("${ary[@]}")         # assign column names
    else                                # handle the body lines
        for (( i = 0; i < ${#ary[@]}; i++ )); do
            printf -v "${col_names[i]}" "${ary[i]}"
                                        # assign the variable "${col_names[i]}" to the input field
        done
        # now you can access the values via its column name
        echo "Fnames=$Fnames"
        echo "MAF=$MAF"
        fname_list+=("$Fnames")         # create a list of Fnames
    fi
    (( nr++ ))                          # increment the input line number
done < MetaData.csv
echo "${fname_list[@]}"                 # print the list of Fnames

Output:

Fnames=19.vcf.gz
MAF=0.05
Fnames=20.vcf.gz
MAF=
Fnames=21.vcf.gz
MAF=
Fnames=22.vcf.gz
MAF=
19.vcf.gz 20.vcf.gz 21.vcf.gz 22.vcf.gz
  • The statetemt IFS=, read -a ary is mostly equivalent to your first three lines; it splits the input on ",", and assigns the array variable ary to the field values.
  • There are several ways to use a variable's value as a variable name (Indirect Variable References). printf -v VarName Value is one of them.

[EDIT]

Based on the OP's updated input file, here is an another version:

#!/bin/bash

nr=1                                    # initialize input line number to 1
while IFS=, read -r -a ary; do          # split the line on "," then assign "ary" to the fields
    if (( nr == 1 )); then              # handle the header line
        col_names=("${ary[@]}")         # assign column names
    else                                # handle the body lines
        for (( i = 0; i < ${#ary[@]}; i++ )); do
            printf -v "${col_names[i]}" "${ary[i]}"
                                        # assign the variable "${col_names[i]}" to the input field
        done
    fi
    (( nr++ ))                          # increment the input line number
done < MetaData.csv

for n in "${col_names[@]}"; do          # iterate over the variable names
    echo "$n=${!n}"                     # print variable name and its value
done

# you can also specify the variable names literally as follows:
echo "MAF=$MAF HWE=$HWE Geno_Missing=$Geno_Missing Inds_Missing=$Inds_Missing"

Output:

MAF=0.05
HWE=1E-06
Geno_Missing=0.01
Inds_Missing=0.01
MAF=0.05 HWE=1E-06 Geno_Missing=0.01 Inds_Missing=0.01

As for the output, the first four lines are printed by echo "$n=${!n}" and the last line is printed by echo "MAF=$MAF .... You can choose either statement depending on your usage of the variables in the following code.

tshiono
  • 21,248
  • 2
  • 14
  • 22
  • Thanks for your answer, I need [colname=its value] as a variable. It seems that this can not be done for the first column (Fnames), but it may be possible for the rest of the columns because each column just has one value. Thanks. – Mehdi Esmaeilifard Jan 05 '21 at 09:12
  • Thank you for the response, but I'm afraid I cannot understand your problem. For instance, the variable `Fnames` (1st column name) is assigned to `19.vcf.gz` (1st column value) just after processing the 2nd line of MetaData.csv. If this is not what you want, could you please elaborate on your expectations? BR. – tshiono Jan 05 '21 at 10:39
  • Please ignore the first column and consider the rest as data. Column names >>> MAF, HWE, Geno_Missing, Inds_Missing Correspond values >>> 0.05, 1E-06, 0.01, 0.01 I need to make variables which are the column's name and put the corresponding value of each column into variables. e.g. MAF=0.05 and ... Please see the output of my posted code. Thanks. – Mehdi Esmaeilifard Jan 05 '21 at 14:10
  • Again, my code *does* assign variables of column name to corresponding column values not only on the 1st column but every columns as you expect. The statement `printf -v ..` does the magic. I just printed just `Fnames` and `MAF` as an example but other variables are assigned as well. Please try to put a line such as `echo "$HWE $Geno_Missing $Inds_Missing"` after the `echo "MAF=$MAF"` line to see the result. – tshiono Jan 05 '21 at 23:29
  • Obviously these variables are overwritten line by line. What do you want to deal with the 3rd and the following lines? Just stop on the 2nd line or ignore empty column values? I still don't see your overall expectations and requirements. Although you say "Please see the output of my posted code." but your code includes a lot of errors and produces no meaningful output as you know. BR. – tshiono Jan 05 '21 at 23:30
  • Actually, my problem is those errors. I ran the code again, It assigns each column value to its name. But when I call values ($MAF, $HWE, etc) they are empty. As you said, the variables are overwritten, so I ran the code without the first columns and the result was the same. Maybe the code should stop at the first line. Thanks – Mehdi Esmaeilifard Jan 06 '21 at 14:34
  • As you have modified the input file `MetaData.csv`, I have updated my answer accordingly. Would you please test the new script? BR. – tshiono Jan 07 '21 at 00:41
  • @ Dear Tshiono, codes worked well. It's exactly what I want. Thank you so much. – Mehdi Esmaeilifard Jan 07 '21 at 06:32
  • Good to know that. Thanks. – tshiono Jan 07 '21 at 06:35