1

I realize that there is a way to add a column using 'awk'.

But I'm not so familiar with this alternative, so I though I'd ask whether there's a way to add a column to a tab-delimited text file using Python?

Specifically, here's the scenario I need to add a column in:

I have data in the following format (I realize looking at it that the format may not be so clear, but the phone, email, and website correspond to different columns):

name    phone   email   website
D G Albright M.S.           
Lannister G. Cersei M.A.T., CEP 111-222-3333    cersei@got.com  www.got.com
Argle D. Bargle Ed.M.           
Sam D. Man Ed.M.    000-000-1111    dman123@gmail.com   www.daManWithThePlan.com
Sam D. Man Ed.M.            
Sam D. Man Ed.M.    111-222-333     dman123@gmail.com   www.daManWithThePlan.com
D G Bamf M.S.           
Amy Tramy Lamy Ph.D.    

And I'm writing a parser for the first column. I want to add the 'area of practice', in this case an ex would be 'CEP', to a new column entitled 'area'. I iterate through the file, and use the pop function to separate out the area from the rest of the first column. Then I add this to a list, which just dies in the function because it's not added to the spreadsheet.

Here's my script:

def parse_ieca_gc(s):  

    ### HANDLE NAME ELEMENT ######

    degrees = ['M.A.T.','Ph.D.','MA','J.D.',
               'Ed.M.', 'M.A.', 'M.B.A.', 
               'Ed.S.', 'M.Div.', 'M.Ed.', 
               'RN', 'B.S.Ed.', 'M.D.', 'M.S.']
    degrees_list = []

    # check whether the name string has 
    # an area of practice by 
    # checking if there's a comma separator
    if ',' in s['name']:

        # separate area of practice from name 
        # and degree and bind this to var 'area'
        split_area_nmdeg = s['name'].split(',')
        area = split_area_nmdeg.pop()

        # Split the name and deg by spaces. 
        # If there's a deg, it will match with one 
        # of elements and will be stored deg list.
        # The deg is removed name_deg list 
        # and all that's left is the name.
        split_name_deg = re.split('\s',split_area_nmdeg[0])
        for word in split_name_deg:
            for deg in degrees:
                if deg == word:
                    degrees_list.append(split_name_deg.pop())
                name = ' '.join(split_name_deg)

Expected output

name    phone   email   website    area   degrees
D G Albright                                                                      M.A.          
Lannister G. Cersei 111-222-3333    cersei@got.com  www.got.com    CEP    M.A.T.
Argle D. Bargle                                                             Ed.M.           
Sam D. Man  000-000-1111    dman123@gmail.com   www.daManWithThePlan.com   Ed.M.
Sam D. Man                                                                        Ed.M.         
Sam D. Man  111-222-333     dman123@gmail.com   www.daManWithThePlan.com      Ed.M.
D G Bamf                                                                          M.S.          
Amy Tramy Lamy                                                                   Ph.D.  

This code is also not working:

fieldnames = ['name','degrees','area','phone','email','website']
with open('ieca_first_col_fake_text.txt','r') as input:
    with open('new_col_dict.txt','w') as output:
        dict_writer = csv.DictWriter(output, fieldnames, delimiter = '\t')
        dict_reader = csv.DictReader(input, delimiter = '\t')
        #dict_writer.writeheader(fieldnames)
        for row in dict_reader:
            print row
            dict_writer.writerow(fieldnames)
            dict_writer.writerow(row)
oz123
  • 27,559
  • 27
  • 125
  • 187
goldisfine
  • 4,742
  • 11
  • 59
  • 83
  • What's the expected output? – Ashwini Chaudhary Jul 08 '13 at 14:38
  • possible duplicate of [How to add a new column to a CSV file using Python?](http://stackoverflow.com/questions/11070527/how-to-add-a-new-column-to-a-csv-file-using-python) – Daenyth Jul 08 '13 at 14:47
  • what do you mean by `in this case an ex would be 'CEP', to a new column entitled 'area'.` ? – oz123 Jul 08 '13 at 15:36
  • @goldisfine, non related, but please format your python code with pep8 or something similar. Code which longer than 80 column is hard to read. – oz123 Jul 08 '13 at 15:37
  • @goldisfine, also not related. Gold Is Not fine. Make a short search about the environmental damages of gold mining. You will be surprised. – oz123 Jul 08 '13 at 15:38
  • @Oz123, first nonrelated: will format like that in the future. Second nonrel: rearrangement of my name. And I realize that many precious metals have trafficking and environmental things associated with them. – goldisfine Jul 08 '13 at 15:42
  • @goldisfine, what about my question regarding English? I don't understand your goal. I might be able to further help, if I understand your conditioning ... – oz123 Jul 08 '13 at 15:46

2 Answers2

3

See answer here, a tab delimeted file is like CSV with tab as separator.

How to add a new column to a CSV file using Python?

Community
  • 1
  • 1
oz123
  • 27,559
  • 27
  • 125
  • 187
  • When I use this method, it doesn't appear to be tab separated. – goldisfine Jul 08 '13 at 14:10
  • The post that this refers to is not satisfactory and this will not be marked as an answer until it refers to a method to create a tab-separated column. – goldisfine Jul 08 '13 at 15:10
  • @goldisfine csvreaders accept a parameter specifying which delimiter to use. By default it is a comma, but if you just add the kwarg `delimiter = "\t"` it will work exactly the same. That's a miniscule edit and this answer should be accepted. – Slater Victoroff Jul 08 '13 at 15:18
1

This is what I ended up doing:

with open('ieca_first_col_fake_text.txt','r') as input, \
   open('new_col_dict.txt', 'w') as output:
        dict_reader = csv.DictReader(input, delimiter = '\t')
        dict_reader.fieldnames.append('area')
        dict_reader.fieldnames.append('degrees')

        dict_writer = csv.DictWriter(output, 
                                     fieldnames=dict_reader.fieldnames, 
                                     delimiter='\t')
        for row in dict_reader:
            print row
            dict_writer.writeheader()
            dict_writer.writerow(row)
oz123
  • 27,559
  • 27
  • 125
  • 187
goldisfine
  • 4,742
  • 11
  • 59
  • 83