6

I have this large 294,000 row csv with urls in column 1 and numbers in column 2.

I need to sort them from the smallest number to the largest number. I have loaded it into the software 'CSVed' and it handles it okay, it doesn't crash or anything but when I click the top of the column to sort it, it doesn't make it in order from smallest to largest, it's all just muddled up.

Anyone have any ideas? I've been searching around all day, I thought I might ask here.

Thanks.

Ray Lovelock
  • 63
  • 1
  • 3
  • Is there _no_ order after you do a sort? This sounds strange to me. Do you have access to tools such as Java or maybe R? – Tim Biegeleisen Jan 02 '17 at 08:31
  • There is an order, but it's not the right sort of order, it's still muddled up – Ray Lovelock Jan 02 '17 at 08:36
  • For example it'll go like 170 then 17050 then 17 etc. the first numbers will match but it goes from small number to large to small to small to large if that makes sense – Ray Lovelock Jan 02 '17 at 08:37
  • Is it sorting the numbers as if they were words, not numbers? Like 111 would come before 19? – user12341234 Jan 02 '17 at 08:38
  • Perhaps, it's hard to tell really. Is there anything else I can try with CSVed? I can't seem to find any manual around anywhere. – Ray Lovelock Jan 02 '17 at 08:39
  • 1
    Your data is being sorted lexicographically: https://en.wikipedia.org/wiki/Lexicographical_order. Somehow you need to convince your program that the second column is numeric not text. – user12341234 Jan 02 '17 at 08:39
  • do you need to code it ? or you are just looking for tool to do it ? + @user12341234 raise good question , you can code it pretty easily – LordTitiKaka Jan 02 '17 at 08:39
  • Do you have MS Excel? Not my favorite tool, but given enough time it should be able to handle this. – Tim Biegeleisen Jan 02 '17 at 08:40
  • I think there's a limit in Excel and in OpenOffice, that's the only reason I'm not using them. If there weren't a limit they'd work perfectly. – Ray Lovelock Jan 02 '17 at 08:41
  • I was recommended CSVed once on here for handling larger files. If only I could figure out how to sort it from smallest to largest. – Ray Lovelock Jan 02 '17 at 08:43

2 Answers2

5

If you have access to a unix system (and your urls don't have commas in them) this should do the trick:

sort -t',' -n -k2 filename

Where -t says columns are delimited by commas, -n says the data is numeric, and -k2 says to sort based on the second column.

user12341234
  • 6,573
  • 6
  • 23
  • 48
3

You can use gnu sort. It takes has small memory footprint and can even use multiple CPUs for sort.

sort -t ,  -k 2n file.csv

Gnu sort is available by default in most of linux distributions as well as for MacOS by default (though later has slightly different options). You can install it for windows as well, for example from CoreUtils for Windows page.

For more information about sort invocation use the manual

Robert Navado
  • 1,319
  • 11
  • 14