3

I downloaded a JSON file using Curl on my server from https://api.data.gov/ed/collegescorecard/v1/schools?api_key=[my_API_key]

(I have uploaded the file to TinyUpload if you want to play around with it.)

The downloaded file is 1.5MB and has a very large (and valid) JSON object. However, on the server when I run the command wc -l against the file, it returns 0. Running wc -c returns the correct byte count.

I opened the file in TextEdit and it looked fine. I did notice that man wc on my server (CentOS 5.5) and man wc on my Mac (Yosemite) seem to have different descriptions for what the -l flag does:

CentOS 5.5:

print the newline counts

OSX 10.10.5 Yosemite

The number of lines in each input file is written to the standard output.

Which manual is correct? Does wc -l count lines or new lines? If it does count lines and not new lines, is there ever a case when wc -l could return 0 even when there is a line in the file?

Is it also possible that Mark's comment regarding Windows based characters on this related SO post is the correct diagnoses? I ran cat -vet against my file, but couldn't find ^M using grep, and it's way too much text to manually search.

Community
  • 1
  • 1
Matthew Herbst
  • 29,477
  • 23
  • 85
  • 128
  • Why do you want to count the lines in a JSON file? –  Oct 27 '15 at 18:26
  • @Evert massive legacy script, traditionally used on CSVs, that I don't have the time to replace. Part of the script is to check if the file it is processing has no data. I'm going to change it from `wc -l` to `wc -c` probably, but just trying to cover all my basis. – Matthew Herbst Oct 27 '15 at 18:32

1 Answers1

4

The manpage on OS X also says (first paragraph in the description):

A line is defined as a string of characters delimited by a < newline> character.

So there is no contradiction between the two versions of the mangpages.

Since your file does not have a newline, wc -l correctly returns 0.

  • 1
    Why does every post I find about counting lines in a file have people use `wc -l` then? Every line in your file could have a newline after it, except the last one, and then `wc -l` would return a value off by one. – Matthew Herbst Oct 27 '15 at 18:17
  • Because most files do end with a newline. `wc -l` is also often used in pipes, where this is generally no problem either. –  Oct 27 '15 at 18:24
  • See also one of the related links in the side bar: http://stackoverflow.com/questions/729692/why-should-files-end-with-a-newline?rq=1 , which tells you it's a POSIX definition for a line. –  Oct 27 '15 at 18:24
  • I guess the logical conclusion is: don't try and count lines in a JSON file. In fact, why would you want to do that? JSON files can be formatted in multiple ways with the content being the same, but with 0 or 100s of (new)lines in it. –  Oct 27 '15 at 18:26
  • Not sure what version of OSX you are on, Yosemite's description even explicitly calls out a newline character: `A line is defined as a string of characters delimited by a character` – Matthew Herbst Oct 27 '15 at 18:53
  • 1
    Argh: fooled by Markdown. I had the < newline> string in my answer, but of course MD turns that into (invisible) HTML. Put in an extra space to prevent that (escaping the < with a \ doesn't appear to work). –  Oct 27 '15 at 19:04