Miller - Ignore valid field names when using -N

Question

I'm using miller to process some CSV files like so:

mlr --mmap --csv --skip-comments -N cut -f 2 my.csv

It works well, but some of the CSV files contain field names and some do not, which is why I'm using -N. In the files that have field names, they get printed in the output. You would think that having the headerless-csv-output bundled in the N flag they wouldn't, but they are. Maybe it's a bug? Anyway, how would do I prevent the field names from being printed? If the input needs to be altered somehow and piped in that's fine, but the output is being uniquely processed.

Here's the documentation I've been referencing:

my.csv

################################################################
#                                                              #
#                                                              #
#                      BIG OL' COMMENT BLOCK                   #
#                                                              #
#                                                              #
################################################################
#
"first_seen_utc","dst_ip","dst_port","c2_status","last_online"
"2021-01-17 07:30:05","67.213.75.205","443","online","2021-06-24"
"2021-01-17 07:44:46","192.73.238.101","443","online","2021-06-24"

Expected output

67.213.75.205
192.73.238.101

Present output

dst_ip
67.213.75.205
192.73.238.101

Could you add some rows of the input, and the example output you want? — aborruso, Jun 24 '21 at 19:53

aborruso · Accepted Answer · 2021-06-24T21:58:05.163

1

If your first field is always a date, you can use it

mlr --csv --skip-comments -N filter -S '$1=~"^[0-9]{4}-"' then cut -f 2 input.txt

edited Jun 24 '21 at 21:58

answered Jun 24 '21 at 21:51

aborruso

4,938
3
23
40

score 0 · Answer 2 · answered Jun 24 '21 at 21:31

0

if you use N for a CSV that has a header, you will add an automatic numeric header and the the original header will be a data row. Using N you will have also --implicit-csv-header

+---------------------+----------------+----------+-----------+-------------+
| 1                   | 2              | 3        | 4         | 5           |
+---------------------+----------------+----------+-----------+-------------+
| first_seen_utc      | dst_ip         | dst_port | c2_status | last_online |
| 2021-01-17 07:30:05 | 67.213.75.205  | 443      | online    | 2021-06-24  |
| 2021-01-17 07:44:46 | 192.73.238.101 | 443      | online    | 2021-06-24  |
+---------------------+----------------+----------+-----------+-------------+

If you want an headerless output you must use only it. If you run

mlr --csv --skip-comments --headerless-csv-output cut -f dst_ip input.txt

you will have

67.213.75.205
192.73.238.101

answered Jun 24 '21 at 21:31

aborruso

4,938
3
23
40

My situation is that I have multiple CSV files, and some have valid headers while others have none. I want one command that will work for them all, and only require that a numeric column is designated. I'm aware of the functionality you've described. – T145 Jun 24 '21 at 21:43
@T145 could you share also a wrong CSV input? – aborruso Jun 24 '21 at 21:49
There is no "right" or "wrong" CSV input: There's only valid CSV files that either have a valid header or don't. I would like _one_ command that works in either situation. The `-N` flag covers the invalid header situation. Now I need my present command editted to cover the valid header situation, not either/or. – T145 Jun 24 '21 at 21:53
@T145 I have asked you, only to build a command valid for all kind of inputs. I have added another answer – aborruso Jun 24 '21 at 21:53

Miller - Ignore valid field names when using -N

2 Answers2