2

Given the text file

sample.txt

2012-01-01  09:00   San Diego   Men's Clothing    214.05    Amex
2012-01-01  09:00   San Diego   Women's Clothing  153.57    Visa
2012-01-01  09:00   Omaha       Music             66.08     Cash

I want to be able to read only the text for the third column. This code

for line in open("sample.txt"):
      city=line.split()[2]
      print(city)

can read the third column to a certain degree:

San
San
Omaha

but what I want is:

San Diego
San Diego
Omaha

How do I do this?

john tan
  • 123
  • 2
  • 8
  • 1
    You need to specify what substring splits your string. Something like: `split("\t")`. It's explained [here](https://stackoverflow.com/a/743807/3103891) – GalAbra Jan 20 '18 at 16:23
  • 1
    Possible duplicate of [How to split a string into a list?](https://stackoverflow.com/questions/743806/how-to-split-a-string-into-a-list) – GalAbra Jan 20 '18 at 16:26
  • Possible duplicate of [python - Is it possible to convert a string and put it into a list \[\] containing tuple ()?](https://stackoverflow.com/questions/48335590/python-is-it-possible-to-convert-a-string-and-put-it-into-a-list-containing) – Rakesh Jan 20 '18 at 16:32

5 Answers5

3

It does look like your file is separated by tabs (or \t).

Have you tried splitting it by tabs ?

Instead of city=line.split()[2] try city=line.split('\t')[2].

Anyways, it looks like this file has been generated by an excel or similar, have you tried exporting it to a CSV (comma separated values) format, instead of pure txt ?

Then you can simply split by commas, like city=line.split(',')[2]

Hope it helps

marcelotokarnia
  • 386
  • 1
  • 6
1

It appears your input file has fixed width fields. You might be able to achieve your goal using indexing in this case, e.g.

>>> for line in open('test.txt'):
...     print(line[20:32])
...
San Diego
San Diego
Omaha

You could add a .strip() to trim off trailing spaces if you need that for further processing etc.

Jonathon McMurray
  • 2,881
  • 1
  • 10
  • 22
0

Your text file delimits with at least two spaces, so specifying to split on two spaces and stripping away the remaining spaces on the ends with strip() works.

with open('sample.txt', 'r') as file_handle:
    for line in file_handle:
        city=line.split('  ')[2].strip()
        print(city)

yields:

San Diego
San Diego
Omaha
zdgriffith
  • 164
  • 5
  • 1
    We don't know that it's guaranteed to leave at least two spaces between columns. What happens if the 3rd column contains a city 11 letters long? I suspect you'd get a single space before the next column. – David Knipe Jan 20 '18 at 17:26
0

Since your items in sample.txt are mostly separated by 2 spaces, you need to use split(' ') instead. If you use split(), this will by default split every whitespace, such as turning "Men's Clothing" into ["Men's", "Clothing"], Which is not what you want.

First thing you can do is view your items with:

with open('sample.txt') as in_file:
    for line in in_file.readlines():
        items = [x.strip() for x in line.strip().split('  ') if x]
        print(items)

Which outputs:

['2012-01-01', '09:00', 'San Diego', "Men's Clothing", '214.05', 'Amex']
['2012-01-01', '09:00', 'San Diego', "Women's Clothing", '153.57', 'Visa']
['2012-01-01', '09:00', 'Omaha', 'Music', '66.08', 'Cash']

Now if you want to extract the third column:

print(items[2])

Which gives:

San Diego
San Diego
Omaha
RoadRunner
  • 25,803
  • 6
  • 42
  • 75
-1

You will need to preprocess your input file by adding a delimeter which you will specify in your split() function. Like this:

2012-01-01,  09:00,   San Diego,   Men's Clothing,    214.05,    Amex
2012-01-01,  09:00,   San Diego,   Women's Clothing,  153.57,    Visa
2012-01-01,  09:00,   Omaha,       Music,             66.08,     Cash

Then

for line in open("sample.txt"):
  city=line.split(",")[2]
  print(city)
  • 1
    Then how do you know where to put the commas? You're just pushing the hard part of the question on to someone else. – David Knipe Jan 20 '18 at 17:27