0

I have a large number of .txt files (N > 1000) that have data of interest, and I wish to identify files whose "mean" value exceeds a given threshold (say, 0.5), and print the name of the file in which that is the case. The data in each file are organized like this:

[
    {
      "parameter": {
          "max": 0.6640571758027143,
          "mean": 0.13404294175225137,
          "min": 0.0,
          "std": 0.09435715828616785
      },
      {
        "intensity": [
            {
                "max": [
                    3.1719575216784217
                ],
                "mean": [
                    -3.552713678800501e-17
                ],
                "min": [
                    -2.707115982837323
                ],
                "std": [
                    1.0000000000000004
                ]
                ...

To make matters slightly more complicated, I only wish to read the "mean" value for the "parameter" and not for "intensity".

I had the idea that I should read this file in using a for loop, roughly containing the following code:

subjects = [allmyfilenames]
for subj in subjects:
    file = open('C:/%s.txt' %subj, 'r')
    for line in file.readlines(): print line

From there, I am a bit lost. How might I identify the correct line to use in matching against my threshold (0.5)?

martineau
  • 119,623
  • 25
  • 170
  • 301
  • 7
    If that is a valid JSON file, which it looks like, [this](http://stackoverflow.com/a/2835672/322909) answer might be of some use to you. – John Nov 06 '12 at 03:18
  • Your input file is ill-formed -- all the brackets and braces don't come in matched sets. – martineau Nov 06 '12 at 03:30
  • yes, sorry, I extracted the input from a larger set for simplicity, but apparently I did not match the brackets and braces properly for this example. – user1801867 Nov 06 '12 at 03:33
  • @anijhaw: Your edits did not fix the input file -- it's not quite that simple...so I did a rollback to the OP's original version. – martineau Nov 06 '12 at 11:24
  • Yes I was thinking about that. – anijhaw Nov 06 '12 at 22:05

1 Answers1

0

Try something like this, I wasnt entirely sure of your data format but something like this might work for the data format above. Not tested**

subjects = [allmyfilenames]
    for subj in subjects:
        with open('C:/%s.txt' %subj, 'r') as datafile:
            data = json.load(datafile)
            if data[0]['parameter']['mean'] > 0.5:
                print subj
anijhaw
  • 8,954
  • 7
  • 35
  • 36