24

I have searched for a similar question here, but surprisingly could not find any.

In GNU bash, there is (a construct? a structure? a data type?) called "arrays". Arrays are well documented in the bash documentation, so I think that I understand the basics.

But suddenly, in the documentation there also comes up the term "list". For example, it is used when talking about filename expansion (emphasis is mine):

If one of these characters appears, then the word is regarded as a pattern, and replaced with an alphabetically sorted list of filenames matching the pattern (see Pattern Matching).

Therefore, I have three questions:

  1. What does "list" mean here?
  2. Is it used in the same meaning as in for loop description?
  3. I am somehow lost in a whitespace world in bash. If this "list" is a separate concept to arrays (as I think), is it treated specially when it comes to whitespaces and IFS, or in the same way as an array?

There is another use of the "list" term when talking about sequence of one or more pipelines, but I am aware that it most probably means a different kind of lists.


UPDATE

  1. Since I see that the way that this "list structure" works is very similar to how arrays work – what are the differences between them?

UPDATE 2

  1. What are the uses cases when "lists" are preferred over arrays? For example, let us compare. Let us create two files:

    $ touch file1.txt file2.txt

When it comes to lists, I can do the following:

$ A=*.txt ; echo $A
file1.txt file2.txt
$ 

And when it comes to arrays, I can do the following:

$ B=(*.txt) ; echo ${B[@]}
file1.txt file2.txt
$ 

While these two results are exactly the same, are there any cases when arrays and lists return different results?


UPDATE 3

I might have confuse something, because in the above example it seems to be a list "wrapped" in an array. I do not know whether it makes a difference.

codeforester
  • 39,467
  • 16
  • 112
  • 140
Silv
  • 395
  • 1
  • 2
  • 11
  • In update 2, they are only the same because you have two file names that contain no whitespace. Try creating some files that *do* have whitespace in their names, then compare `printf '%s\n' $A`, `printf '%s\n' ${B[@]}`, `printf '%s\n' "$A"`, and `printf '%s\n' "${B[@]}"`. – chepner Aug 22 '19 at 16:43
  • Thanks, @chepner. I see that another point is what tools are used to display. This would fit in my understanding of the list from my second last comment to the **codeforester** answer, that Bash is made to **treat** something as a list, not to **define** something as a list. — Btw., well, it has been some time, so after your comment, I had to recall all this thread. But it is nice to recognize that I already (almost) understand it. ;) – Silv Aug 25 '19 at 20:44

3 Answers3

19

There is no data type called list in Bash. We just have arrays. In the documentation that you have quoted, the term "list" doesn't refer to a data type (or anything technical) - it just means a sequence of file names.

However, glob expansions work very similar to array elements as far as sequential looping is considered:

for file in *.txt; do          # loop through the matching files
                               # no need to worry about white spaces or glob characters in file names
  echo "file=$file"
done

is same as

files=(*.txt)                  # put the list of matching files in an array
for file in "${files[@]}"; do  # loop through the array
  echo "file=$file"
done

However, if you were to hardcode the file names, then you need quotes to prevent word splitting and globbing:

for file in verycramped.txt "quite spacious.txt" "too much space.txt" "*ry nights.txt"; do ...

or

files=(verycramped.txt "quite spacious.txt" "too much space.txt" "*ry nights.txt")
for file in "${files[@]}"; do ...

Read more about word splitting here:

codeforester
  • 39,467
  • 16
  • 112
  • 140
  • 6
    In other words, it's just used in the English sense to mean a sequence of items, and not in any technical sense. – that other guy Oct 19 '18 at 23:38
  • codeforester, @thatotherguy, thanks for such a quick answers! Firstly: if a list means "a sequence", then I am still confused. A "sequence" for me is a list of strings split by whitespaces (spaces, tabs or newlines), where the strings themselves cannot contain whitespaces. But is it treated _exactly the same_ in bash? If so, then why filename expansion returns not a well-documented array, but a "list", defined nowhere? – Silv Oct 19 '18 at 23:46
  • Secondly: thanks for a good example. I think that I start to understand the difference between `files=(*.txt) ; for file in ${files[@]} ; do echo "$file" ; done` and `files=(*.txt) ; for file in "${files[@]}" ; do echo "$file" ; done` (the latter with double quotes around `${files[@]}`). – Silv Oct 19 '18 at 23:47
  • The double quotes are needed to prevent word splitting and globbing. Please see the link in the updated answer. – codeforester Oct 19 '18 at 23:52
  • 1
    The C language has specifier-qualiifier lists in a declaration, function parameter lists, lists of replacement tokens in a macro: none of these are data structures in C program. The word "list" is just used informally. – Kaz Oct 20 '18 at 00:07
  • @Kaz, thanks for mentioning – I remember something similar! Therefore I am confused about a "list" in the C language context, too… — codeforester, thanks for the resources, they are helpful. I have read them now (and the GNU documentation I have read earlier). But I still do not understand what is a "list" (or a "sequence"). There is an array: `A=(*.txt)`, where its elements are retrieved under _indices_ of A: `A[0]`, `A[1]` and `A[2]`. And there is a "list" – `A=*.txt`. Since they seem to produce _exactly the same_ results (as I checked), how values of a "list" are retrieved? – Silv Oct 20 '18 at 00:22
  • Sorry for the syntax. It should be `${A[0]}` and so on. – Silv Oct 20 '18 at 00:36
  • @silv They don't produce the same results: `A=*.txt` will assign the entire expansion to `A` as one string. – Kaz Oct 20 '18 at 02:31
  • @Kaz, that is exactly what I would like to hear as an answer for my first question. So, in my own words: "a list" == "a string". But, to be careful: is it documented somewhere? Is it happening in all the cases where one can say "list"? – Silv Oct 20 '18 at 02:38
  • No, it's intended specifically **not** to mean a single string. In this case it's **multiple** strings. I'm not sure why you're saying `A=*.txt` is a list. This could be part of an assignment list, but so could `A=foo` – that other guy Oct 20 '18 at 02:49
  • @thatotherguy, thank you. Yes, it is an assignment. To be clear: as I understand the documentation, a "list" is that with what bash replace a string `*.txt`; therefore, in this case, the `A` variable stores a "list". And adding your explanation, it stores… multiple strings? Without any "structure" that I can access, like an array? – Silv Oct 20 '18 at 03:00
  • @silv You would get all the answers if you look at the linked posts and documentation. – codeforester Oct 20 '18 at 04:35
  • 1
    Also, it is not a good practice to modify the post and add more questions after it has been answered. – codeforester Oct 20 '18 at 04:42
  • During expansion of `*.txt`, of course the shell will come up with a list of matches. This isn't visible as a list data structure. When `*.txt` appears as an ordinary argument (not in an assignment like `A=*.txt`) then it produces multiple arguments: it turns into an argument list. We can can call a function `func *.txt` and then inside `func`, the expansions of `*.txt` occupy the positional parameters `$1`, `$2`, ... and their count is given by `$#`. That is sort of a list. You can pop the front element using `shift` or pass to another function/command using `"$@"`. – Kaz Oct 20 '18 at 04:51
  • 1
    codeforester, I have read them (maybe too quick?). Please, excuse my notunderstanding. They seem to answer a couple of my other questions, especially about whitespaces; they are very helpful, but the questions that I am asking are about a concept of "list", and such questions are not answered by any of those articles (or at least I do not see it there). And thank you for clarification about updates. But, I did this because I though that other users will be able to see my doubts; they just clarify my doubts from the first three questions. – Silv Oct 20 '18 at 14:21
  • @Kaz OK, when it comes to arguments, I think I understand. So, in my words, arguments may be a "list" in a sense arbitrary to bash because their splitting is documented. As I understand, the _first_ thing that bash does after reading a line (from a script, from stdin etc.) is to split the input sequence of characters by IFS, producing "words" or "tokens" (I don't know which term is more appropriate). So, this is a "list" in the sense of IFS, because by IFS is the string split to make a "list". Before splitting, it is not a "list", it is a "string" (that is, a "line" in the sense of the user). – Silv Oct 20 '18 at 14:40
  • @Kaz, and although I still do not know _when_ bash determines the first "token" or "word" as a file to execute and the rest as its arguments, that is another question. For now, I still do not understand **a "list" in a sense of assignment.** Since a variable is not an "interpreter", it does not just interpret this "list" that it is assigned to as another data structure (as bash does in my understanding), but it _stores_ it. So, how it stores? In what structure, a string? For know, I start supposing that it stores it _in the same way_ as arrays, but it just does not provide a way to access it… – Silv Oct 20 '18 at 14:48
  • Oh, and I did not notice another answers. Maybe things will be a bit clearer, I will see in a minute. – Silv Oct 20 '18 at 14:50
  • codeforester, thanks for the last link, I have not noticed it. I will read it. – Silv Oct 20 '18 at 15:00
  • 1
    POSIX doesn't describe variables in terms of implementation details, but rather the operations and their results: the programming model. Variables look much like character strings. How they are stored is not visible to the shell programmer; you have to read the source code of the shell you're using. Bash could be representing them differently from zsh, from pdksh, dash, ... yet portable scripts behave the same way. – Kaz Oct 20 '18 at 16:59
  • 1
    codeforester, the link [I just assigned a variable, but echo $variable shows something else](https://stackoverflow.com/q/29378566/6862601) is really helpful. I now start to think that the term "list" in bash documentation in the sense of the result of filename expansions is used **not in a sense of how bash defines it, but in a sense of how bash interpret it**. In other words: arrays are defined, so we can say "This is an array [dot]". "Lists" are not defined, so we have to say "This is a list, _because_ bash _will_ somehow split it (e.g. by whitespace)." Am I right? – Silv Oct 20 '18 at 17:40
  • @Kaz, thanks. More or less, I understand. So, in bash, the value of a variable is a "string" indeed – in the sense of a sequence of characters, including all possible whitespaces? – Silv Oct 20 '18 at 17:44
5

The term "list" is not really a specific technical term in bash; it is used in the grammar to refer to a sequence of commands (such as the body of a for loop, or the contents of a script), and this use has shown up in the documentation of program structure, but that's a very specific type of list.

In the context you ask about, I'd say a "list" is a value that consists of any number (including 0) of shell words. The arguments to a single command are such a list.

A shell word, in turn, is what you might call a single string in another language. Normally, when you type a command line, it is separated into words by the characters listed in $IFS (normally whitespace, that is, spaces and horizontal tabs), but you can avoid that by any of the various quoting mechanisms and thus create shell words that contain IFS characters.

If you wish to store a list in a shell parameter, that parameter must be an array; in that case, each word of the list becomes an element of the array. For example, the list of arguments passed into a command are available in the default array, which is accessed via $ followed by the index that would go in between the square brackets in a named array reference, e.g. "$@" for all the elements turned back into a list, "$0" for the first element (which is the command name), etc.

When an array is expanded back into a list of words, you have three options; the elements of the array can be kept as they originally were, irrespective of contents ("$@"); they can be concatenated together, joined by spaces, into one big single shell word ("$*"), or they can be first concatenated into one big string and then re-parsed into words using the usual IFS-delimiter rules ($@ or $* without the quotation marks).

Except for a few builtins like mapfile (a.k.a. readarray), bash doesn't have much support for arrays. For example, the environment can only contain strings, so you can't export an array. You can't pass an array into a function as an array, although you can certainly use the value of an array (or a slice of an array) as (some or all of) the list of arguments passed to a function. You can also pass the name of an array to a function, which can then use name-references and eval to manipulate that array in its caller's scope, but as with all mechanisms for reaching out of one's lexical scope in any language, this is generally considered bad practice. And of course, a function can't return an array, but then a bash function can't return anything but a one-byte numeric exit code. It can output text, but that text is unstructured; if the caller captures it with command or process substitution, it's up to that caller to parse the text however it desires – such as making an array containing one element word for each line of output, which is the default behavior of mapfile/readarray.

Anyway, the point is, lists in this context are values, while arrays are containers that store list values. Technically, shell parameters (a.k.a. "variables") can be arrays, and as arrays they can hold lists; they can't be lists, and it doesn't really make sense to refer to an "array value". But informally, "array" and "list" are often used interchangeably; that's the nature of lazy humans and the shell's fluidity.

Mark Reed
  • 91,912
  • 16
  • 138
  • 175
  • MarkReed, thanks for the answer. My understanding is now greater. So, in my own words – please, confirm it or not: `*.txt` produces a "list", but it is a "list" in the sense of human, yes? By bash it is treated as a "string" (in an _arbitrary_ sense), which is to be split by the characters that bash is aware of in particular situation (space, tab, newline, colon, slash etc.)? So, in conclusion, you mean a "value" == a "string", arbitrary understood? (By the way, I have now new questions about `$@` and `$*` that you mentioned, but it is outside the scope of the main questions about "list".) – Silv Oct 20 '18 at 15:13
  • Well, in this context, `*.txt` only "produces" what bash causes it to produce, so it doesn't make sense to talk about it "producing" something that bash then parses. Bash parses the command line you type; if that includes the literal 5-character sequence `*.txt` outside of any quotation marks, it will replace that sequence with a list of all the filenames matching that pattern. Now here we are talking about a list of words in shell terms, as an actual artifact in bash's memory, not just a human list. – Mark Reed Oct 20 '18 at 21:13
  • OK. I think that the point is that I am still thinking about a "list" in an assignment. And this thinking is wrong. I have read in another stackoverflow thread (do not remember where exactly) that such a sequence of characters in an assignment is treated just as a value, like you said, that is, a "string" in my understanding. It is treated as a list _only when_ bash reads the value of this variable, then splitting it etc. Is it true? – Silv Oct 20 '18 at 21:40
  • I'm not sure where your confusion lies. Strings and lists are both values, and bash parameters (commonly called "variables") can hold either type of value. But they are two types of parameters; the kind that holds a list we call an array. The assignment of a list to an array looks like `array=(list goes here)`, but note that the parentheses are part of the assignment syntax and not part of any general syntax for list literals, which bash does not have. – Mark Reed Oct 20 '18 at 22:21
  • My confusion lies in that when I do `A=*.txt`, the `A` variable just holds `*.txt`. Just the five characters. It is neither a "list" nor any array. Just a string. The value of the variable `A` is treated as a "list" only when `A` is being parsed using `$A` (using, of course, the correct expansions). – Silv Oct 20 '18 at 22:29
  • 1
    Yup; `A=*.txt` is a simple string assignment. In shell terms, it results in `$A` containing a single word. After that assignment, the shell sees the two command lines `ls *.txt` and `ls $A` as EXACTLY THE SAME. In each case you type only two shell words, but the number of words actually passed to `ls` depends on the contents of the directory. On the other hand, A=(*.txt) will consult the directory when you do the assignment; then A is an array containing the list of matching filenames, and even if you then delete them `ls "${A[@]}"` will pass all of them as arguments to `ls`. – Mark Reed Oct 21 '18 at 02:31
  • So, to refer to my initial 3 questions, can you confirm what I think now? 1) In the sense of [filename expansion](https://www.gnu.org/software/bash/manual/html_node/Filename-Expansion.html#Filename-Expansion), "list" means "product of expansion"; 2) In the sense of [for loop](https://www.gnu.org/software/bash/manual/html_node/Looping-Constructs.html#Looping-Constructs) it also means "product of expansion"; 3) A "list" may indeed be treated specially in terms of IFS and whitespaces, because _it is always a product of some expansion_. Can you confirm my answers to my own questions? – Silv Oct 21 '18 at 13:56
  • 1
    Even an array is really just syntactic sugar for managing *separate* variables, each of which has a string value. There is no array *value* anywhere. – chepner Aug 22 '19 at 16:26
1

A list in bash is a specific sequence of expressions separated by a pipeline. From man bash, e.g.

Lists

   A list is a sequence of one or more pipelines separated by one of the 
   operators ;, &, &&, or ||, and optionally terminated by one of ;, &, or 
   <newline>. 

   Of these list operators, && and || have equal precedence, followed by 
   ; and &, which have equal precedence.

   A sequence of one or more newlines may appear in a list instead of a 
   semicolon to delimit commands.

   If a command is terminated by the control operator &, the shell 
   executes the command in the background in a subshell. The shell does 
   not wait for the command to finish, and the return status is 0. 
   Commands separated by a ; are executed sequentially; the shell waits 
   for each command to terminate in turn. The return status is the exit 
   status of the last command executed.

   AND and OR lists are sequences of one of more pipelines separated by 
   the && and || control operators, respectively. AND and OR lists are 
   executed with left associativity. An AND list has the form

          command1 && command2

   command2 is executed if, and only if, command1 returns an exit status 
   of zero.

   An OR list has the form

          command1 || command2

   command2 is executed if and only if command1 returns a non-zero exit 
   status. The return status of AND and OR lists is the exit status of 
   the last command executed in the list.

A List is used in forming Compound Commands (see man bash).

There is another use of the "list" term when talking about sequence of one or more pipelines, but I am aware that it most probably means a different kind of lists.

Both:

$ A=*.txt ; echo $A

and

$ B=(*.txt) ; echo ${B[@]}

technically are Lists in bash.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • 1
    DavidCRankin, thanks for the answer. I now see that a term "list" is more ambiguous in bash than one may think. To be clear, I am rather asking about the result of `*.txt`, for example in an assignment, not about the whole line. But, I think that it is very good that you confirmed that sense of "lists". +1 – Silv Oct 20 '18 at 15:17