0

I'm trying to write a program to sort and tag lines in a file. For example, suppose I have a .txt file of a health clinic with a variety of information about a patient. I want to tag the information. Suppose the data is given in the following order:

Patient ID  
Age 
Gender  
Height  
Weight  
HBA1C level 
Cholesterol 
Smoker status   
Systolic BP 
Diastolic BP

And suppose the file contains the following information (all of which is made up):

A31415  
54  
M   
180 
90  
6.7 
100 
No  
130 
65  
A32545  
62  
F   
160 
80  
7.2 
120 
Yes 
180 
92

My problem is trying to write a loop for each patient, with

A31415  
54  
M   
180 
90  
6.7 
100 
No  
130 
65

being one patient and

A32545  
62  
F   
160 
80  
7.2 
120 
Yes 
180 
92

being the second. I'm struggling to get the code to produce the following result:

<patient>       
<patientID> A31415  </patientID>    
<clinic>    UIHC    </clinic>   
<age>   54  </age>  
<gender>    M   </gender>   
<height>    180 </height>   
<weight>    90  </weight>   
<hba1c> 6.7 </hba1c>    
<cholesterol>   100 </cholesterol>  
<smoker>    No  <smoker>    
<systolic>  130 </systolic> 
<diastolic> 65  </diastolic>    
</patient>  
<patient>       
<patientID> A32545  </patientID>    
<clinic>    UIHC    </clinic>   
<age>   62  </age>  
<gender>    F   </gender>   
<height>    160 </height>   
<weight>    80  </weight>   
<hba1c> 7.2 </hba1c>    
<cholesterol>   120 </cholesterol>  
<smoker>    Yes </smoker>   
<systolic>  180 </systolic> 
<diastolic> 92  </diastolic>    
</patient>

Any help would be greatly appreciated.

jboyda5
  • 75
  • 1
  • 8

1 Answers1

4

This seems quite feasible. I think something like this should work...

file_keys = ['Patient ID', 'Age', 'Gender',  
             'Height', 'Weight', 'HBA1C level' 
             'Cholesterol', 'Smoker status',   
             'Systolic BP', 'Diastolic BP']

with open('datafile') as fin:
    user_info = dict(zip(file_keys, fin))
    # Now process user_info into your xml 

Of course this takes only one user from the file. To get them all, you'll need a loop. You'll know you've got all your users once the user_info returned is an empty dictionary.

with open('datafile') as fin:
    while True:
        user_info = dict(zip(file_keys, fin))
        if not user_info:  # empty dict.  we're done.
            break
        # Now process user_info into your xml

The reason why this works is because zip will truncate at the shorter of the two input iterables. In other words, it takes 1 element from file_keys and matches it with 1 line from the file. When file_keys runs out, it doesn't take any more lines, but the file object remembers it's position for the next read.

mgilson
  • 300,191
  • 65
  • 633
  • 696
  • 2
    This would work, and would be how I'd approach this. You may want to explain why this works, though. – Martijn Pieters Dec 07 '13 at 20:10
  • 1
    Only works for one patient though.... maybe `for patient in iter(lambda: dict(zip(file_keys, fin)), {})` instead – Jon Clements Dec 07 '13 at 20:12
  • @MartijnPieters, are you sure? `zip` is documented as truncating after the smaller sequence is exhausted. – Mark Ransom Dec 07 '13 at 20:12
  • @MarkRansom: *Exactly*. So it'll only read `len(file_keys)` lines from `fin`. That's **why** this works. Provided you add a loop, of course. – Martijn Pieters Dec 07 '13 at 20:12
  • 2
    @MartijnPieters The problem the OP appears to be having is that they want to be able to extract *multiple* patients in that format from the file... Therefore, we're missing a loop on this one – Jon Clements Dec 07 '13 at 20:13
  • @MartijnPieters indeed... sorry - but I'd amended my comment while you were typing that you'd amended yours :p – Jon Clements Dec 07 '13 at 20:15
  • Indeed, `iter()` is the way to go here. – Martijn Pieters Dec 07 '13 at 20:15
  • Don't tell me, mgilson, that you don't know how to use `iter()` with a sentinel? Jon's comment gives you a solution on a plate, better than the `while` loop. :-) – Martijn Pieters Dec 07 '13 at 20:18
  • @MartijnPieters -- I don't think that `iter` is the way to go here. Do you really think that 2 argument `iter` + `lambda` is more readable than an extremely simple `while` loop that you can comment? – mgilson Dec 07 '13 at 20:20
  • @mgilson: Perhaps, but I do have a soft spot for `iter()` with a sentinel... `[-_-]~` – Martijn Pieters Dec 07 '13 at 20:21
  • @mgilson It's not entirely unreadable and doesn't require an extra level of indentation – Jon Clements Dec 07 '13 at 20:23
  • As an example of a recent monstrosity where I used `iter` with a sentinal ... check out [this](http://stackoverflow.com/a/20249240/748858) beauty. – mgilson Dec 07 '13 at 20:23
  • @mgilson: Pull out the `lambda` into a variable: `read_record = lambda: dict(zip(file_keys, fin))`, then use `for record in iter(read_record, {}):`? – Martijn Pieters Dec 07 '13 at 20:23
  • @MartijnPieters -- Yeah, that would work and certainly isn't bad code. I'm just not convinced that it's *better* than an extremely simple while loop. – mgilson Dec 07 '13 at 20:26