0

I would like to create a list for every column in a txt file. The file looks like this:

NAME S1 S2 S3 S4 A 1 4 3 1 B 2 1 2 6 C 2 1 3 5

PROBLEM 1 . How do I dynamically make the number of lists that fit the number of columns, such that I can fill them? In some files I will have 4 columns, others I will have 6 or 8...

PROBLEM 2. What is a pythonic way to iterate through each column and make a list of the values like this:

list_s1 = [1,2,2]

list_s2 = [4,1,1]

etc.

Right now I have read in the txt file and I have each individual line. As input I give the number of NAMES in a file (here HOW_MANY_SAMPLES = 4)

def parse_textFile(file):

    list_names = []
    with open(file) as f:
        header = f.next()
        head_list = header.rstrip("\r\n").split("\t")
        for i in f:
            e = i.rstrip("\r\n").split("\t")
            list_names.append(e)

    for i in range(1, HOW_MANY_SAMPLES):    
        l+i = []
        l+i.append([a[i] for a in list_names])

I need a dynamic way of creating and filling the number of lists that correspond to the amount of columns in my table.

JClarke
  • 788
  • 1
  • 9
  • 22
amc
  • 813
  • 1
  • 15
  • 28

2 Answers2

2

Problem 1:

You can use len(head_list) instead of having to specify HOW_MANY_SAMPLES.

You can also try using Python's CSV module and setting the delimiter to a space or a tab instead of a comma.

See this answer to a similar StackOverflow question.

Problem 2:

Once you have a list representing each row, you can use zip to create lists representing each column: See this answer.

With the CSV module, you can follow this suggestion, which is another way to invert the data from row-based lists to column-based lists.

Sample:

import csv

# open the file in universal line ending mode 
with open('data.txt', 'rU') as infile:

    # register a dialect that skips extra whitespace
    csv.register_dialect('ignorespaces', delimiter=' ', skipinitialspace=True)

    # read the file as a dictionary for each row ({header : value})
    reader = csv.DictReader(infile, dialect='ignorespaces')
    data = {}
    for row in reader:
        for header, value in row.items():
            try:
                if (header):
                    data[header].append(value)
            except KeyError:
                data[header] = [value]

for column in data.keys():
    print (column + ": " + str(data[column]))

this yields:

S2: ['4', '1', '1']
S1: ['1', '2', '2']
S3: ['3', '2', '3']
S4: ['1', '6', '5']
NAME: ['A', 'B', 'C']
Community
  • 1
  • 1
SamN
  • 21
  • 5
2

By using pandas you can create a list of list or a dic to get what you are looking for.

Create a dataframe from your file, then iterate through each column and add it to a list or dic.

from StringIO import StringIO
import pandas as pd

TESTDATA = StringIO("""NAME   S1   S2   S3   S4
                        A   1    4   3   1 
                        B   2    1   2   6
                        C   2    1   3   5""")

columns = []
c_dic = {}
df = pd.read_csv(TESTDATA, sep="   ", engine='python')
for column in df:
    columns.append(df[column].tolist())
    c_dic[column] = df[column].tolist()

Then you will have a list of list for all the columns

for x in columns:
    print x

Returns

['A', 'B', 'C']
[1, 2, 2]
[4, 1, 1]
[3, 2, 3]
[1, 6, 5]

and

for k,v in c_dic.iteritems():
    print k,v

returns

S3 [3, 2, 3]
S2 [4, 1, 1]
NAME ['A', 'B', 'C']
S1 [1, 2, 2]
S4 [1, 6, 5]

if you need to keep track of columns name and data

Daniel
  • 5,095
  • 5
  • 35
  • 48