Extract text files into multiple columns in python

Question

I have different text files and I want to extract the values from there into a csv file. Each file has the following format

main cost: 30
additional cost: 5

I managed to do that but the problem that I want it to insert the values of each file into a different columns I also want the number of text files to be a user argument

This is what I'm doing now

  numFiles = sys.argv[1]
  d = [[] for x in xrange(numFiles+1)]
  for i in range(numFiles): 
      filename = 'mytext' + str(i) + '.text'
      with open(filename, 'r') as in_file:
      for line in in_file:
        items = line.split(' : ')
        num = items[1].split('\n')

        if i ==0:
            d[i].append(items[0])

        d[i+1].append(num[0])

        grouped = itertools.izip(*d[i] * 1)
        if i == 0:
            grouped1 = itertools.izip(*d[i+1] * 1)

        with open(outFilename, 'w') as out_file:
            writer = csv.writer(out_file)
            for j in range(numFiles):
                for val in itertools.izip(d[j]):
                    writer.writerow(val)

This is what I'm getting now, everything in one column

main cost   
additional cost   
30   
5   
40   
10

And I want it to be

main cost        | 30  | 40
additional cost  | 5   | 10

Where does the last column come from in the desired output? Are ther only two lines in each input file? — wwii, Jul 29 '16 at 22:57
I'm assuming the input file looks something like: main cost: 30 additional cost: 5 main cost: 40 additional cost: 10 — Michael, Jul 29 '16 at 22:57

score 2 · Answer 1 · edited May 23 '17 at 11:44

2

You could use a dictionary to do this where the key will be the "header" you want to use and the value be a list.

So it would look like someDict = {'main cost': [30,40], 'additional cost': [5,10]}

edit2: Went ahead and cleaned up this answer so it makes a little more sense.

You can build the dictionary and iterate over it like this:

from collections import OrderedDict

in_file = ['main cost : 30', 'additional cost : 5', 'main cost : 40', 'additional cost : 10']
someDict = OrderedDict()

for line in in_file:
    key,val = line.split(' : ')
    num = int(val)
    if key not in someDict:
        someDict[key] = []

    someDict[key].append(num)

for key in someDict:
    print(key)
    for value in someDict[key]:
        print(value)

The code outputs:

main cost
30
40
additional cost
5
10

Should be pretty straightforward to modify the example to fit your desired output.

I used the example @ append multiple values for one key in Python dictionary and thanks to @wwii for some suggestions.

I used an OrderedDict since a dictionary won't keep keys in order.

You can run my example @ https://ideone.com/myN2ge

edited May 23 '17 at 11:44

Community

1
1

answered Jul 29 '16 at 21:55

Michael

141
1
9

For this solution, you can be sure that there are only two keys, so you could construct the dictionary before-hand with those two keys and an empty list for values - then you can get rid of the ```if/else``` for the dictionary assignment. Alternatively if you are not sure about the keys beforehand you could use [```collections.defaultdict```](https://docs.python.org/3/library/collections.html#collections.defaultdict). – wwii Jul 29 '16 at 23:38
1

When you split text and plan on using the individual items later in your code, it is nice to give them names - it makes subsequent code easier to read. Take advantage of unpacking: in this case something like - ```key, value = line.split(':') ; value = value.strip()``` – wwii Jul 29 '16 at 23:46
Both great examples. For the first, I would probably keep it my way so in the future the file formats can change without having to modify the code. I agree with your second example. – Michael Jul 29 '16 at 23:53
Play around with ```collections.defaultdict```, it solves the problem of trying to assign to a missing key without using ```if/then```s or ```try/except```s. – wwii Jul 29 '16 at 23:56
That works as well unless you want to use an OrderedDict, which is probably what OP wants. Otherwise, it won't always output in the same order. I'll edit my example to include your first suggestion though. It's much easier to read that way. – Michael Jul 30 '16 at 00:12

beroe · Answer 2 · 2016-07-30T06:06:11.870

This is how I might do it. Assumes the fields are the same in all the files. Make a list of names, and a dictionary using those field names as keys, and the list of values as the entries. Instead of running on file1.text, file2.text, etc. run the script with file*.text as a command line argument.

#! /usr/bin/env python

import sys

if len(sys.argv)<2:
    print "Give file names to process, with wildcards"
else:
    FileList= sys.argv[1:]
    FileNum = 0
    outFilename = "myoutput.dat"
    NameList = []
    ValueDict = {}
    for InfileName in FileList:
        Infile = open(InfileName, 'rU') 
        for Line in Infile: 
            Line=Line.strip('\n')
            Name,Value = Line.split(":")
            if FileNum==0:
                NameList.append(Name.strip())
            ValueDict[Name] = ValueDict.get(Name,[]) + [Value.strip()]
        FileNum += 1 # the last statement in the file loop
        Infile.close()
    # print NameList
    # print ValueDict

    with open(outFilename, 'w') as out_file:
        for N in NameList:
            OutString =  "{},{}\n".format(N,",".join(ValueDict.get(N)))
            out_file.write(OutString)

Output for my four fake files was:

main cost,10,10,40,10
additional cost,25.6,25.6,55.6,25.6

Thanks @beroe but I want the output to be saved in an csv file and the `|` representing a different column — Lily, Jul 30 '16 at 00:05
this is what I get when I try the above code TypeError: can only join an iterable — Lily, Aug 01 '16 at 13:45
Insert a line that prints ValueDict and see what it says. Each value should be a list of strings (numbers) if the data match your example. If there are blank lines or header lines, you could insert a check in the loop before the `ValueDict[Name]=` part... — beroe, Aug 01 '16 at 14:49

Extract text files into multiple columns in python

2 Answers2