0

I'm trying to compare two CSV files by reading the first line-by-line and grepping the second file for a match. Using Diff is not a viable solution. I seem to be having a problem with having the email address stored as a variable when I grep the second file.

#!/bin/bash

LANG=C
head -2 $1 | tail -1 | while read -r line; do
  line=$( echo $line | sed 's/\n//g' )
  echo $line
  cat $2 | cut -d',' -f1 | grep -iF "$line"
done

Variable $line contains an email address that DOES exist in file $2, but I'm not getting any results.

What am I doing wrong?

File1

Email
email@verizon.net
email@gmail.com
email@yahoo.com

File2

email,,,,
email@verizon.net,,,,
email@gmail.com,,,,
email@yahoo.com,,,,
Joshua Cook
  • 12,495
  • 2
  • 35
  • 31
  • Can you post your entire implementation? – Joshua Cook Nov 02 '15 at 14:03
  • Ok. Updated the post. This version is reading just the second line of the first file, because I know that it is a match. – user3204352 Nov 02 '15 at 14:12
  • probably a simple `awk` would be best than piping some many things! See something similar: [Remove duplicates from text file based on second text file](http://stackoverflow.com/q/30820894/1983854) – fedorqui Nov 02 '15 at 14:16
  • Well I could just do `grep -iF "$line" $2` But that's not working either. So, the cut and cat statements are just for removing variable behavior. – user3204352 Nov 02 '15 at 14:34
  • maybe try this: #!/bin/bash cmp -s filename_1 filename_2 > /dev/null if [ $? -eq 1 ]; then echo is different else echo is not different fi – Noproblem Nov 02 '15 at 14:55

1 Answers1

0

Given:

# csv_0.csv
email
me@me.com
you@me.com
fee@me.com

and

# csv_1.csv
email,foo,bar,baz,bim
bee@me.com,3,2,3,4
me@me.com,4,1,1,32
you@me.com,7,4,6,6
gee@me.com,1,2,2,6
me@me.com,5,7,2,34
you@me.com,22,3,2,33

I ran

$ pattern=$(head -2 csv_0.csv | tail -1 | sed s/,.*//g)
$ grep $pattern csv_1.csv
me@me.com,4,1,1,32
me@me.com,5,7,2,34

To do this for each line in csv_0.csv

#!/bin/bash

LANG=C
filename="$1"
{
  read # don't read csv headers
  while read line
  do
      pattern=$(echo $line | sed s/,.*//g)
      grep $pattern $2
  done
} <"$filename"

Then

$ ./csv_read.sh csv_2.csv csv_3.csv
me@me.com,4,1,1,32
me@me.com,5,7,2,34
you@me.com,7,4,6,6
you@me.com,22,3,2,33
Joshua Cook
  • 12,495
  • 2
  • 35
  • 31
  • Hmm... It seems to keep spitting out the @yahoo.com email addys over and over, but does not spit out the exact email address matches... – user3204352 Nov 02 '15 at 17:35
  • I have no context for this error. The above works perfectly on my machine. I don't know what your specific error is. – Joshua Cook Nov 02 '15 at 18:36
  • Let me be more clear. I am perfectly willing to help, but you make reference to specifics about your issue I would have no way of knowing about (@yahoo.com addresses?). What I provided you is a general, go-right path demonstration of the underlying question you asked. I am of the opinion that it meets the criteria of an answer. If you need further assistance, please be detailed about what you are asking. If my answer meets the criteria of a response, please accept the answer. – Joshua Cook Nov 02 '15 at 18:54
  • Actually, I apologize. It is not printing just the yahoo.com emails; It is matching NO emails except for empty lines. When it encounters an empty line, it matches EVERY line of file 2. So basically its not printing any matches. – user3204352 Nov 02 '15 at 19:46
  • @user3204352 Sounds like this is better handled in chat: https://chat.stackoverflow.com/rooms/94018/bash-grep-for-specific-email-address-in-csv – Joshua Cook Nov 02 '15 at 19:55
  • My rank is too weak for chat :( – user3204352 Nov 02 '15 at 20:06
  • Ok. Have you tried to run my script using the files I included on your system? Did it work? – Joshua Cook Nov 02 '15 at 20:08
  • Ok. So it did work on your samples. The question then would be, why would it not work on my files? The only thing I can think of is that File 1 is JUST email addresses. File 2 is in this format: Email,,,, (4 trailing commas). But grep should still be able to match it, correct? But it's not. Hmm... – user3204352 Nov 02 '15 at 20:22
  • Hmmm. I edited the data above to match your format. The script works fine for me. Can you post actual (or slightly edited to remove sensitive information) lines from your files? Just a few should suffice. Add to your question. – Joshua Cook Nov 02 '15 at 20:29
  • So I tested my script against your data and it works with one exception. Per this thread: (http://stackoverflow.com/questions/12916352/shell-script-read-missing-last-line) you need to have a new line at the end of your data file. – Joshua Cook Nov 03 '15 at 00:01
  • Not sure how else to help. You appreciate you accepting my answer. – Joshua Cook Nov 03 '15 at 00:02