0

I am trying to simply count the lines in the .CSV per column, while at the same time ignoring empty lines.

I use below and it works for the 1st column:

cat /path/test.csv | cut -d, -f1 | grep . | wc -l` >> ~/Desktop/Output.csv
#Outputs:  8 

And below for the 2nd column:

cat /path/test.csv | cut -d, -f2 | grep . | wc -l` >> ~/Desktop/Output.csv
#Outputs: 6

But when I try to count 3rd column, it simply Outputs the Total number of lines in the whole .CSV.

cat /path/test.csv | cut -d, -f3 | grep . | wc -l` >> ~/Desktop/Output.csv
    #Outputs: 33
    #Should be: 19?

I've also tried to use awk instead of cut, but get the same issue. I have tried creating new file thinking maybe it had some spaces in the lines, still the same.

Can someone clarify what is the difference? Betwen reading 1-2 column and the rest?

20355570_01.tif,,
20355570_02.tif,,
21377804_01.tif,,
21377804_02.tif,,
21404518_01.tif,,
21404518_02.tif,,
21404521_01.tif,,
21404521_02.tif,,
,22043764_01.tif,
,22043764_02.tif,
,22095060_01.tif,
,22095060_02.tif,
,23507574_01.tif,
,23507574_02.tif,
,,23507574_03.tif
,,23507804_01.tif
,,23507804_02.tif
,,23507804_03.tif
,,23509247_01.tif
,,23509247_02.tif
,,23509247_03.tif
,,23527663_01.tif
,,23527663_02.tif
,,23527663_03.tif
,,23527908_01.tif
,,23527908_02.tif
,,23527908_03.tif
,,23535506_01.tif
,,23535506_02.tif
,,23535562_01.tif
,,23535562_02.tif
,,23535636_01.tif
,,23535636_02.tif

csv in spreadsheet software

Ryul
  • 139
  • 12
  • Please post the actual `.csv` file (edit your answer) instead of a screenshot. Could it be that the "empty" cells in the third column contain a space or something like `""`? – Socowi May 15 '19 at 12:18
  • Self-contained posts are better than links, especially since I cannot open your link without a google account. Open the `csv` file in a text editor, copy everything, paste it here, and format it as code. If you give important information like this, edit your answer instead of posting comments. – Socowi May 15 '19 at 12:21
  • 1
    Thank you for posting the `csv` file. I cannot reproduce your problem. The last command returns `19` for me on `netbsd`, `arch`, and `ubuntu`. I guess your last column contains trailing spaces and they got lost here on StackOverflow. You can confirm this by running `grep ' ' /path/test.csv`. If there is any output then there are spaces. – Socowi May 15 '19 at 12:32

3 Answers3

2

That happens when input file has DOS line endings (\r\n). Fix your file using dos2unix and your command will work for 3rd column too.

dos2unix /path/test.csv

Or, you can remove the \r at the end while counting non-empty columns using awk:

awk -F, '{sub(/\r/,"")} $3!=""{n++} END{print n}' /path/test.csv
oguz ismail
  • 1
  • 16
  • 47
  • 69
  • Hey! I didn't have dos2unix on my workmachine. But the answer is still correct the problem was within newlines. I had to convert from CR to LF line terminators. By going into terminal and typing `file test.csv` it confirmed that the file indeed had CR newlines. I managed to fix it with `perl -pi -e 's/\r\n|\n|\r/\n/;' $*`. – Ryul May 17 '19 at 11:42
  • 1
    I noticed that CR only happens when I output into .csv with Javascript. I ended up using: `awk -v RS='\r\n' 'END{print NR}'`. Which doesn't require conversion. As with `perl -pi -e 's/\r\n|\n|\r/\n/;' $*` it only read per line, and needs to be looped etc.. Anyway thanks, for pointing out the difference between DOS/UNIX line endings! – Ryul May 17 '19 at 15:13
1

edit: I think oguz ismail found the actual reason in their answer. If they are right and your file has windows line endings you can use one of the following commands without having to convert the file.

cut -d, -f3 yourFile.csv cut | tr -d \\r | grep -c .
cut -d, -f3 yourFile.csv | grep -c $'[^\r]' # bash only

old answer: Since I cannot reproduce your problem with the provided input I take a wild guess:

The "empty" fields in the last column contain spaces. A field containing a space is not empty altough it looks like it is empty as you cannot see spaces.

To count only fields that contain something other than a space adapt your regex from . (any symbol) to [^ ] (any symbol other than space).

cut -d, -f3 yourFile.csv | grep -c '[^ ]'
Socowi
  • 25,550
  • 3
  • 32
  • 54
1

The problem is in the grep command: the way you wrote it will return 33 lines when you count the 3rd column.

It's better instead to use the following command to count number of lines in .CSV for each column (example below is for the 3rd column):

cat /path/test.csv | cut -d , -f3 | grep -cve '^\s*$'

This will return the exact number of lines for each column and avoid of piping into wc. See previous post here: count (non-blank) lines-of-code in bash

T1B0
  • 11
  • 3