0

Good Evening.

I encountered a strange phenomenon in dealing with awk's last field that I want to share it with you.
I have a log file for social networks which contains some fields separated by |. The fields are not important imho but they appear in this formating.
id|name|lastname|...|Social_Media_Used(nothing)
There are 9 separate fields.

Every row contains a user. e.g. ^random_numbers|Aris|something|...|Facebook$

The goal is to find a way of finding a total for every social media used.I have done this using the above code.

grep -v '^#' $3 | awk -F\| '{print $9}' | sort | uniq -c | awk '{print $1$2}'  

First command removes # from my file that are considered comments.

Second command finds and prints the field 9 which corresponds at the field Social_Media_Used.This is the last field so I guess it will have \n at the end.

After that I sort and count the field and last awk prints the output like this:

884Blogger  
1105Facebook  
1326Flickr  
1104Google+  
1105Instagram  
1105LinkedIn  
1325Twitter  
1546Youtube  

If I try in the last this command:
awk '{print $2$1}' then something strange happens.
If I store it it in a file I can see it like this:

Blogger  
 884  
Facebook  
 1105  
Flickr  
 1326  
Google+  
 1104  
Instagram  
 1105
LinkedIn  
 1105  
Twitter  
 1325  
Youtube  
 1546  

If howerer I try to see the output form from terminal I see this:

884gger  
1105book  
1326kr  
1104le+  
1105agram  
1105edIn  
1325ter  
1546ube  

DESIRED OUTPUT IS:
Blogger 884
Facebook 1105
Flickr 1326
Google+ 1104
Instagram 1105
LinkedIn 1105
Twitter 1325
Youtube 1546

I searched everything about sed or awk's RS,ORS or FRS and I also tried with printf or print but I couldn't find anything that matched or even came close to have word-space-number in the same line.No matter how I print or printf these lines.Howewer, when I try to print a dummy file I copy-pasted from main with 20 lines everything goes smoothly.Also, everything goes smoothly if I try to printf or print the field 8 or 7.

Where lies the solution to this problem?In the long file of 9500 files?Or in the fact that exists newline after the word?What do you think?

Aris Barlos
  • 13
  • 1
  • 1
  • 4

2 Answers2

1

your data most likely include \r\n line endings. First run dos2unix file

you can eliminate most of the pipes with this as well

$ awk -F\| '!/^#/{a[$9]++} END{for(k in a) print k,a[k]}' file | sort 
karakfa
  • 66,216
  • 7
  • 41
  • 56
1

Replace with GNU awk:

awk '{print $2$1}'

with

awk -v RS='\r*\n' '{print $2$1}'

to handle Unix and DOS/Windows line endings.

Cyrus
  • 84,225
  • 14
  • 89
  • 153