0

I have a text file with four columns: time serial domain server

the contents of the text file are as follows:

15 14 google.com 8.8.8.8
19 45 google.com 8.8.4.4
98 76 google.com 208.67.222.222
20 23 intuit.com 8.8.8.8
45 89 intuit.com 8.8.4.4
43 21 intuit.com 208.67.222.222
78 14 google.com 8.8.8.8
92 76 google.com 8.8.4.4
64 54 google.com 208.67.222.222
91 18 intuit.com 8.8.8.8
93 74 intuit.com 8.8.4.4
65 59 intuit.com 208.67.222.222

What would be the best way to read this file and create a list of dict as follows:

[{"server":"8.8.8.8", 
  "domains":[{"google.com":[{"time":15, "serial":14}, {"time":78, "serial":14}]},
             {"intuit.com":[{"time":20, "serial":23}, {"time":91, "serial":18}]}
            ]
},
{"server":"8.8.4.4", 
 "domains":[{"google.com":[{"time":19, "serial":45}, {"time":92, "serial":76}]},
            {"intuit.com":[{"time":45, "serial":89}, {"time":93, "serial":74}]}
           ]
},
{"server":"206.67.222.222", 
 "domains":[{"google.com":[{"time":98, "serial":76}, {"time":64, "serial":54}]},
            {"intuit.com":[{"time":43, "serial":21}, {"time":65, "serial":59}]}
           ]
}]

The order of the rows could change but the columns always remain the same.

Amistad
  • 7,100
  • 13
  • 48
  • 75
  • You could try using [pandas.read_csv](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html). Use separator as a space, add column information (i.e. column names), and then convert the DataFrame to a dictionary using [pandas.DataFrame.to_dict](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_dict.html). May have to do some grouping to get it exactly how you want it. – Stephen B Mar 20 '17 at 07:01

1 Answers1

1

Perhaps not the best way, but a way that is beneficial in some ways:

servers = {}
file_path = './test.file'
from pprint import pprint
with open(file_path,'rb') as f:
    for line in f:
        _time, serial, domain, ip = line.split()
        current_domains = servers.get(ip, {})
        times = current_domains.get(domain, [])
        times.append({"time": _time, "serial": serial})
        current_domains[domain] = times
        servers[ip] = current_domains
pprint(servers)
pprint([{"server": ip, "domains": [{domain: _time} for domain, _time in domains.items()]} for ip, domains in servers.items()])

Output:

    {'208.67.222.222': {'google.com': [{'serial': '76', 'time': '98'},
                                   {'serial': '54', 'time': '64'}],
                    'intuit.com': [{'serial': '21', 'time': '43'},
                                   {'serial': '59', 'time': '65'}]},
 '8.8.4.4': {'google.com': [{'serial': '45', 'time': '19'},
                            {'serial': '76', 'time': '92'}],
             'intuit.com': [{'serial': '89', 'time': '45'},
                            {'serial': '74', 'time': '93'}]},
 '8.8.8.8': {'google.com': [{'serial': '14', 'time': '15'},
                            {'serial': '14', 'time': '78'}],
             'intuit.com': [{'serial': '23', 'time': '20'},
                            {'serial': '18', 'time': '91'}]}}

[{'domains': [{'intuit.com': [{'serial': '21', 'time': '43'},
                              {'serial': '59', 'time': '65'}]},
              {'google.com': [{'serial': '76', 'time': '98'},
                              {'serial': '54', 'time': '64'}]}],
  'server': '208.67.222.222'},
 {'domains': [{'intuit.com': [{'serial': '23', 'time': '20'},
                              {'serial': '18', 'time': '91'}]},
              {'google.com': [{'serial': '14', 'time': '15'},
                              {'serial': '14', 'time': '78'}]}],
  'server': '8.8.8.8'},
 {'domains': [{'intuit.com': [{'serial': '89', 'time': '45'},
                              {'serial': '74', 'time': '93'}]},
              {'google.com': [{'serial': '45', 'time': '19'},
                              {'serial': '76', 'time': '92'}]}],
  'server': '8.8.4.4'}]

The benefits being, easily keying into the dictionaries, only looping over once to create the insertions.

The only downside to this is that it is not in the same format, and has to be looped over 1 more time to do so, this however still beats having to iterate over the list for every line being inserted.

jmunsch
  • 22,771
  • 11
  • 93
  • 114
  • @jmunsch..Thanks a lot..that works..I have another part of the same question at http://stackoverflow.com/questions/42897984/selecting-only-max-value-in-python-list-of-dicts.. should be right up you alley.. – Amistad Mar 20 '17 at 07:38
  • @Amistad i'd pull the last line comprehension out into a full double for loop and look at using something like: http://stackoverflow.com/questions/12540817/finding-largest-value-in-a-dictionary if you really need to speed it up then it can always be translated to use faster data structures given by numpy and folks. – jmunsch Mar 20 '17 at 07:42