222

I am trying to read the lines of a text file into a list or array in python. I just need to be able to individually access any item in the list or array after it is created.

The text file is formatted as follows:

0,0,200,0,53,1,0,255,...,0.

Where the ... is above, there actual text file has hundreds or thousands more items.

I'm using the following code to try to read the file into a list:

text_file = open("filename.dat", "r")
lines = text_file.readlines()
print lines
print len(lines)
text_file.close()

The output I get is:

['0,0,200,0,53,1,0,255,...,0.']
1

Apparently it is reading the entire file into a list of just one item, rather than a list of individual items. What am I doing wrong?

codeforester
  • 39,467
  • 16
  • 112
  • 140
user2037744
  • 2,245
  • 2
  • 13
  • 3
  • 2
    Just as a note. It looks like this question should be rephrased as how to read a csv file into a list in Python. But I defer to the OP's original intentions over 4 years ago which I don't know. – demongolem Jun 29 '17 at 13:32
  • Related, likely duplicate of: https://stackoverflow.com/questions/7844118/how-to-convert-comma-delimited-string-to-list-in-python, https://stackoverflow.com/questions/24662571/python-import-csv-to-list – AMC Feb 15 '20 at 01:12
  • 2
    Does this answer your question? [How to convert comma-delimited string to list in Python?](https://stackoverflow.com/questions/7844118/how-to-convert-comma-delimited-string-to-list-in-python) – AMC Feb 15 '20 at 01:12
  • **As asked** this is definitely a duplicate of the question @AMC found. **As titled**, this is clickbait for [How to read a file line-by-line into a list?](https://stackoverflow.com/questions/3277503). This should be deleted as it was poorly asked (did not correctly identify the issue, which is **not related to** the file reading), is a bad signpost (because of the titling and overall framing of the question), and the answers do not provide any unique insights (this material is so basic that there is nothing unique to offer). – Karl Knechtel Aug 29 '23 at 19:32

7 Answers7

189

You will have to split your string into a list of values using split()

So,

lines = text_file.read().split(',')

EDIT: I didn't realise there would be so much traction to this. Here's a more idiomatic approach.

import csv
with open('filename.csv', 'r') as fd:
    reader = csv.reader(fd)
    for row in reader:
        # do something
Achrome
  • 7,773
  • 14
  • 36
  • 45
  • 3
    I think that this answer could be bettered... If you consider a multiline `.csv` file (as mentioned by the OP), e.g., a file containing the alphabetic characters 3 by row (`a,b,c`, `d,e,f`, etc) and apply the procedure described above what you get is a list like this: `['a', 'b', 'c\nd', 'e', ... ]` (note the item `'c\nd'`). I'd like to add that, the above problem notwistanding, this procedure collapses data from individual rows in a single mega-list, usually not what I want when processing a record-oriented data file. – gboffi Jan 24 '17 at 18:52
  • 1
    split is going to leave the newlines. Don't do this, use `csv` module or some other existing parser – Jean-François Fabre May 24 '20 at 17:09
67

You can also use numpy loadtxt like

from numpy import loadtxt
lines = loadtxt("filename.dat", comments="#", delimiter=",", unpack=False)
Thiru
  • 3,293
  • 7
  • 35
  • 52
  • 2
    I need this too. I noticed on a Raspberry Pi that numpy works really slow. For this application I reverted to open a file and read it line by line. – A.W. Sep 14 '13 at 15:51
  • 4
    This is useful for specifying format too, via `dtype : data-type` parameter. https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html Pandas read_csv is very easy to use. But I did not see a way to specify format for it. It was reading floats from my file, whereas I needed string. Thanks @Thiru for showing loadtxt. – Ozgur Ozturk Feb 06 '17 at 16:25
  • 1
    if txt files contains strings, then dtype should be specified, so it should be like lines = loadtxt("filename.dat", dtype=str, comments="#", delimiter=",", unpack=False) – Alex M981 Sep 20 '18 at 10:29
28

So you want to create a list of lists... We need to start with an empty list

list_of_lists = []

next, we read the file content, line by line

with open('data') as f:
    for line in f:
        inner_list = [elt.strip() for elt in line.split(',')]
        # in alternative, if you need to use the file content as numbers
        # inner_list = [int(elt.strip()) for elt in line.split(',')]
        list_of_lists.append(inner_list)

A common use case is that of columnar data, but our units of storage are the rows of the file, that we have read one by one, so you may want to transpose your list of lists. This can be done with the following idiom

by_cols = zip(*list_of_lists)

Another common use is to give a name to each column

col_names = ('apples sold', 'pears sold', 'apples revenue', 'pears revenue')
by_names = {}
for i, col_name in enumerate(col_names):
    by_names[col_name] = by_cols[i]

so that you can operate on homogeneous data items

 mean_apple_prices = [money/fruits for money, fruits in
                     zip(by_names['apples revenue'], by_names['apples_sold'])]

Most of what I've written can be speeded up using the csv module, from the standard library. Another third party module is pandas, that lets you automate most aspects of a typical data analysis (but has a number of dependencies).


Update While in Python 2 zip(*list_of_lists) returns a different (transposed) list of lists, in Python 3 the situation has changed and zip(*list_of_lists) returns a zip object that is not subscriptable.

If you need indexed access you can use

by_cols = list(zip(*list_of_lists))

that gives you a list of lists in both versions of Python.

On the other hand, if you don't need indexed access and what you want is just to build a dictionary indexed by column names, a zip object is just fine...

file = open('some_data.csv')
names = get_names(next(file))
columns = zip(*((x.strip() for x in line.split(',')) for line in file)))
d = {}
for name, column in zip(names, columns): d[name] = column
gboffi
  • 22,939
  • 8
  • 54
  • 85
  • The OP said they wanted a list of data from a CSV, not a "list of lists". Just use the `csv` module... – Blairg23 Mar 14 '18 at 05:29
8

This question is asking how to read the comma-separated value contents from a file into an iterable list:

0,0,200,0,53,1,0,255,...,0.

The easiest way to do this is with the csv module as follows:

import csv
with open('filename.dat', newline='') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=',')

Now, you can easily iterate over spamreader like this:

for row in spamreader:
    print(', '.join(row))

See documentation for more examples.

Blairg23
  • 11,334
  • 6
  • 72
  • 72
3

Im a bit late but you can also read the text file into a dataframe and then convert corresponding column to a list.

lista=pd.read_csv('path_to_textfile.txt', sep=",", header=None)[0].tolist() 

example.

lista=pd.read_csv('data/holdout.txt',sep=',',header=None)[0].tolist()

Note: the column name of the corresponding dataframe will be in the form of integers and i choose 0 because i was extracting only the first column

Shreyas H.V
  • 109
  • 1
  • 3
1

Better this way,

 def txt_to_lst(file_path):

    try:
        stopword=open(file_path,"r")
        lines = stopword.read().split('\n')
        print(lines)

    except Exception as e:
        print(e)
  • 1
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Oct 21 '21 at 07:53
0

You can use the build in Python function .eval()

with open('test.txt', 'r') as f:
    text = f.read()
    text_list = eval(text)

The output is:

text:     '[0,0,200,0,53,1,0,255]'
text_list: [0, 0, 200, 0, 53, 1, 0, 255]

Python's eval() allows you to evaluate arbitrary Python expressions from a string-based or compiled-code-based input. This function can be handy when you're trying to dynamically evaluate Python expressions from any input that comes as a string or a compiled code object. source, documentation