1

The title of my question is very similar to other posts, I haven't found anything on my specific example though. I have to read in a text file as "$1", then put the values into an array line by line. Example:

myscript.sh /path/to/file

My question is would this approach work?

1   #!/bin/bash
2   file="$1"
3   readarray array < file

Would this code treat the "path/to/file" as "$1" then place that path into the variable "file". And if that part works correctly I believe line 3 should properly put the lines into an array correct?

This is the contents of the text file:

$ head short - rockyou .txt
290729 123456
79076 12345
76789 123456789
59462 password
49952 iloveyou
33291 princess
21725 1234567
20901 rockyou
20553 12345678
16648 abc123
.
.
.

I hope this is enough information to help

joshpo
  • 11
  • 2
  • 2
    correct answer depends what happens to the file contents after they become an array. Also you need to write `"$file"` on the third line. – karakfa Nov 30 '18 at 22:26
  • after the file becomes an array I will have to sort it and then pull out specific lines which I believe I know how to do. I've just been stuck because I'm supposed to read in using "$1", I can't use the read -r – joshpo Nov 30 '18 at 22:32
  • Then, this is most likely a wrong approach! You can sort the file and extract lines easily as well. – karakfa Nov 30 '18 at 22:36
  • Have you already corrected your code based on @karakfa's feedback? Are you stuck on anything else? – that other guy Nov 30 '18 at 23:56
  • Don't use 'file' as a variable name (e.g. use f1). A simple test (replacing "file" with "f1") and `readarray array < $f1` and adding a line `echo "${array[@]}"`seems to list the contents with one line per entry. –  Dec 01 '18 at 00:32
  • If you have to sort the lines after, it might not be the best solution then. Your code works, providing you correct it with the "$file", but each index of your array will have one whole line (both column). You will have to sort them according to what ? the first column ? If so, is it always and only numerical ? – Andre Gelinas Dec 01 '18 at 02:55
  • @Andy, `< "$f1"` is reliable on more shells than `< $f1`; many common versions of bash still in wide use today didn't suppress string-splitting on names used for redirection, so can give a "bad redirection" error when a filename contains spaces or glob characters unless the quotes are used. – Charles Duffy Dec 01 '18 at 22:24
  • BTW, is there a reason to sort the file *after* you read it, instead of before? `readarray -t array < <(sort file)` (note that there's a space between the two `<`s) will sort, and *then* read each line into an array, already in sort order. – Charles Duffy Dec 01 '18 at 22:41

2 Answers2

1

I use the following for placing the lines of a file in an array:

IFS=$'\r\n' GLOBIGNORE='*' command eval  'array=($(<filename))'

This gets all the columns and you can later work with it.

Edit: Explanations on the procedure above:

  • IFS=$'\r\n': stands for "internal field separator". It is used by the shell to determine how to do word splitting, i. e. how to recognize word boundaries.
  • GLOBIGNORE='*': From the bash's manual page: A colon-separated list of patterns defining the set of filenames to be ignored by pathname expansion. If a filename matched by a pathname expansion pattern also matches one of the patterns in GLOBIGNORE, it is removed from the list of matches.
  • command eval: The addition of command eval allows for the expression to be kept in the present execution environment
  • array=...: Simply the definition.

There are different threads on Stackoverflow and Stackexchange with more details on this: https://unix.stackexchange.com/questions/184863/what-is-the-meaning-of-ifs-n-in-bash-scripting https://unix.stackexchange.com/questions/105465/how-does-globignore-work Read lines from a file into a Bash array

Then I just loop around the array like this:

for (( b = 0; b < ${#array[@]}; b++ )); do
#Do Somethng
done

This could be matter of opinion. Please, wait for more comments.

Edit: Use case with empty lines and globs

After the comments yesterday. I finally have had time to test the suggestions (empty lines, lines with globs)

In both cases the array is working fine when working in conjunction with awk. In the following example I attempt to print only the column2 into a new text file:

IFS=$'\r\n' GLOBIGNORE='*' command eval  'array=($(<'$1'))'
for (( b = 0; b < ${#array[@]}; b++ )); do    
echo "${array[b]}" | awk -F "/| " '{print $2}' >> column2.txt
done

Starting with the following text file:

290729 123456
79076 12345
76789 123456789
59462 password
49952 iloveyou
33291 princess
21725 1234567
20901 rockyou
20553 12345678
16648 abc123





20901 rockyou
20553 12345678
16648 abc123
/*/*/*/*/*/*
20901 rockyou
20553 12345678
16648 abc123

Clear empty lines and globs in the script. The result of the execution is the following:

123456
12345
123456789
password
iloveyou
princess
1234567
rockyou
12345678
abc123





rockyou
12345678
abc123
*
rockyou
12345678
abc123

Clear evidence that the array is working as expected.

Execution example:

adama@galactica:~$ ./processing.sh test.txt
adama@galactica:~$ cat column2.txt
123456
12345
123456789
password
iloveyou
princess
1234567
rockyou
12345678
abc123





rockyou
12345678
abc123
*
rockyou
12345678
abc123

Should we wish to remove empty lines (as it doesn't make sence to me have them in the output) we can do it in awk by changing the following line:

echo "${array[b]}" | awk -F "/| " '{print $2}' >> column2.txt

adding /./

echo "${array[b]}" | awk -F "/| " '/./ {print $2}' >> column2.txt

End Result:

123456
12345
123456789
password
iloveyou
princess
1234567
rockyou
12345678
abc123
rockyou
12345678
abc123
*
rockyou
12345678
abc123

Should you wish to apply it to the whole file (not column by column) you can take a look at the following thread: AWK remove blank lines

Edit: Security concern on rm:

I actually went ahead and placed $(rm -rf ~) in the test file to test what would happen on a virtual machine:

Test.txt contents now:

290729 123456
79076 12345
76789 123456789
59462 password
49952 iloveyou
33291 princess
21725 1234567
20901 rockyou
20553 12345678
16648 abc123
$(rm -rf ~)





20901 rockyou
20553 12345678
16648 abc123
/*/*/*/*/*/*
20901 rockyou
20553 12345678
16648 abc123

Execution:

adama@galactica:~$ ./processing.sh test.txt
adama@galactica:~$ ll
total 28
drwxr-xr-x 3 adama adama 4096 dic  1 22:41 ./
drwxr-xr-x 3 root  root  4096 dic  1 19:27 ../
drwx------ 2 adama adama 4096 dic  1 22:38 .cache/
-rw-rw-r-- 1 adama adama  144 dic  1 22:41 column2.txt
-rwxr-xr-x 1 adama adama  182 dic  1 22:41 processing.sh*
-rw-r--r-- 1 adama adama  286 dic  1 22:39 test.txt
-rw------- 1 adama adama 1545 dic  1 22:39 .viminfo
adama@galactica:~$ cat column2.txt
123456
12345
123456789
password
iloveyou
princess
1234567
rockyou
12345678
abc123
-rf




rockyou
12345678
abc123
*
rockyou
12345678
abc123

No effect on the system. Note: I am using Ubuntu 18.04 x64 LTS on an VM. Best not to try testing the security issue with root.

Edit: set -f necessity:

adama@galactica:~$ ./processing.sh a
adama@galactica:~$ cat column2.txt
[a]
adama@galactica:~$

Works perfectly without set -f

BR

Ivo Yordanov
  • 146
  • 1
  • 8
  • 1
    OP's solution does essentially the same thing, but handles empty lines and globs better – that other guy Nov 30 '18 at 23:30
  • Havent had a problem with this. Wil fill a file with empty lines to see what happens. I havent thought of empty lines tbh. – Ivo Yordanov Nov 30 '18 at 23:43
  • Also try a line with `/*/*/*/*/*/*` – that other guy Nov 30 '18 at 23:47
  • I understand what the code is doing as a whole. Could you explain what " GLODIGNORE='*' command eval 'array=($( – joshpo Nov 30 '18 at 23:50
  • Thx! Will do! Will return tommorow. – Ivo Yordanov Nov 30 '18 at 23:51
  • @thatotherguy just finished testing empty lines and globs. Both seem to be working fine. Please take a look at my edited answer. – Ivo Yordanov Dec 01 '18 at 22:07
  • @joshpo Just finished editing my answer with a further use case and an explanation on your questions. Please, let me know if there is something else that isn't clear. – Ivo Yordanov Dec 01 '18 at 22:08
  • @IvoYordanov On my system it skips the empty lines and takes several minutes to process the line `/*/*/*/*/*/*` – that other guy Dec 01 '18 at 22:18
  • @thatotherguy What OS are you using? I am on Ubuntu Server 18.04 x64 LTS. What is your bash version? Mine is 4.4.19. – Ivo Yordanov Dec 01 '18 at 22:20
  • `eval` introduces security vulnerabilities here, and for no good reason; the code works fine without it. If your data file contains `$(rm -rf ~)` as a string, you **really** don't want it to be parsed as syntax rather than data. – Charles Duffy Dec 01 '18 at 22:24
  • 1
    BTW, have you considered `readarray -t array – Charles Duffy Dec 01 '18 at 22:27
  • @CharlesDuffy if we are going to discuss the solution. I must in all fairness point to the following thread where I originally found it: https://stackoverflow.com/questions/11393817/read-lines-from-a-file-into-a-bash-array I think the security issue is already discussed there. – Ivo Yordanov Dec 01 '18 at 22:29
  • On a closer look, the use of single rather than double quotes makes this less dangerous than I thought it was on initial glance. Still not sure the extra complexity is adding any value that's relevant to the user here -- we don't want folks using `eval` when they don't need it, because it's very easy to make mistakes (change the quoting types to double quotes and it **would** be exploitable). – Charles Duffy Dec 01 '18 at 22:36
  • 1
    You can also eliminate the `eval` by simply using redirection within a *command substitution* which would be an improvement, e.g. `IFS=$'\r\n'; array=( $( – David C. Rankin Dec 01 '18 at 22:39
  • ...the above approach is safer, by the way, if you turn off globbing: `set -f` will turn off glob expansion, then `set +f` can be used to turn it back on later. – Charles Duffy Dec 01 '18 at 22:44
  • @CharlesDuffy tested the security concern on an VM. It had no effect. Look at my edit for details. – Ivo Yordanov Dec 01 '18 at 22:44
  • Yes, as I said, the security is only a concern with double quotes (```eval command "array=($( – Charles Duffy Dec 01 '18 at 22:45
  • @CharlesDuffy wanted to test as soon as I saw it. Have it this way in several scripts in automatic execution. – Ivo Yordanov Dec 01 '18 at 22:46
  • BTW, to demonstrate that `set -f` can still be needed with `GLOBIGNORE='*'`, try creating a file in the current directory with `touch a`, and then having a line containing `[abcde]` in the file. – Charles Duffy Dec 01 '18 at 22:49
  • @CharlesDuffy works without set -f. See edit for more details. Note: switch the column printed to $1. – Ivo Yordanov Dec 01 '18 at 22:54
  • Works without globignore. Nevertheless I am keeping it as is in acordance to the following thread and accepted answer there: https://stackoverflow.com/questions/11393817/read-lines-from-a-file-into-a-bash-array – Ivo Yordanov Dec 01 '18 at 23:01
  • I will be testing readarray instead of `IFS=$'\r\n' GLOBIGNORE='*' command eval 'array=($(<'$1'))'` though – Ivo Yordanov Dec 01 '18 at 23:04
1

Very close. :)

#!/usr/bin/env bash
case $BASH_VERSION in ''|[123].*) echo "ERROR: Bash 4.0 needed" >&2; exit 1;; esac

file="$1"
readarray -t array <"$file"

declare -p array >&2 # print the array to stderr for demonstration/proof-of-concept

Note the use of the -t argument to readarray (to discard trailing newlines), and the use of $file rather than just file.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441