2

My text file is like below.

[0, "we break dance not hearts by Short Stack is my ringtone.... i LOVE that !!!.....\n"]
[1, "I want to write a . I think I will.\n"]
[2, "@va_stress broke my twitter..\n"]
[3, "\" "Y must people insist on talking about stupid politics on the comments of a bubblegum pop . Sorry\n"]
[4, "aww great  "Picture to burn"\n"]
[5, "@jessdelight I just played ur joint two s ago. Everyone in studio was feeling it!\n"]
[6, "http://img207.imageshack.us/my.php?image=wpcl10670s.jpg her s are so perfect.\n"]
[7, "cannot hear the new  due to geographic location. i am geographically undesirable. and tune-less\n"]
[8, "\" couples in public\n"]
[9, "damn wendy's commerical got that damn  in my head.\n"]
[10, "i swear to cheese & crackers @zyuuup is in Detroit like every 2 months & i NEVER get to see him!  i swear this blows monkeyballs!\n"]
[11, "\" getting ready for school. after i print out this\n"]

I want to read every second element from the list mean all the text tweets into array.

I wrote

tweets = []
for line in open('tweets.txt').readlines():
    print line[1]
    tweets.append(line)

but when I see the output, It just takes 2nd character of every line.

Viral Patel
  • 57
  • 2
  • 9
  • that's because it's evaluating it as a string not a list – eagle Apr 11 '18 at 17:18
  • So What should I do to get to serve it as a list? I want a second tweet text element from a text file – Viral Patel Apr 11 '18 at 17:19
  • You probably want `ast.literal_eval` on the line first (but I would remove the newline characters, as they can confuse the parser). – mdurant Apr 11 '18 at 17:20
  • Why do you have this format? If it's something you're creating, a much smarter solution is to change your code that generates the file into something that uses a format you know how to parse back in. If it's something you're getting from elsewhere, you should look at how it's documented—for example, this could be JsonLines, in which case what you want to do is use a JsonLines parser, but it could just be accidentally very close to JsonLines, in which case you don't. – abarnert Apr 11 '18 at 17:21

4 Answers4

2

When you read a text file in Python, the lines are just strings. They aren't automatically converted to some other data structure.

In your case, it looks like each line in your file contains a JSON list. In that case, you can parse the line first using json.loads(). This converts the string to a Python list which you can then take the second element of:

import json
with open('tweets.txt') as fp:
    tweets = [json.loads(line)[1] for line in fp]
Erik Cederstrand
  • 9,643
  • 8
  • 39
  • 63
  • It works fine. Can you please my answer at https://stackoverflow.com/questions/49778665/how-can-i-get-the-nth-element-of-string-for-list-of-list-in-python/49779833#49779833 ? – Viral Patel Apr 11 '18 at 17:38
  • Or Can you tell me How can I read the tweets.txt file so that tweets contains the line as array and not as string? – Viral Patel Apr 11 '18 at 17:42
  • Uhm, that's what my example code does? Just remove the `[1]` if you want all elements instead of just the second one. – Erik Cederstrand Apr 11 '18 at 17:43
  • My question is what if I want tweets[] as a full line from the text file and want to access the second element for the tweets[]? – Viral Patel Apr 11 '18 at 17:48
  • It will be good if you can look at my question at https://stackoverflow.com/questions/49778665/how-can-i-get-the-nth-element-of-string-for-list-of-list-in-python/49779833#49779833 – Viral Patel Apr 11 '18 at 17:50
1

May be you should consider to use json.loads method :

import json

tweets = []
for line in open('tweets.txt').readlines():
    print json.loads(line)[1]
    tweets.append(line)

There is more pythonic way in @Erik Cederstrand 's comment.

Emre Savcı
  • 3,034
  • 2
  • 16
  • 25
  • It's working perfectly fine. But, Can you tell me why it gives me this error? Traceback (most recent call last): File "k_means.py", line 28, in print json.loads(line)[1] File "C:\Python27\lib\encodings\cp437.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode character u'\u2122' in position 78: character maps to – Viral Patel Apr 11 '18 at 17:35
1

Rather than guessing what format the data is in, you should find out.

  • If you're generating it yourself, and don't know how to parse back in what you're creating, change your code to generate something that can be easily parsed with the same library used to generate it, like JsonLines or CSV.
  • If you're ingesting it from some API, read the documentation for that API and parse it the way it's documented.
  • If someone handed you the file and told you to parse it, ask that someone what format it's in.

Occasionally, you do have to deal with some crufty old file in some format that was never documented and nobody remembers what it was. In that case, you do have to reverse engineer it. But what you want to do then is guess at likely possibilities, and try to parse it with as much validation and error handling as possible, to verify that you guessed right.

In this case, the format looks a lot like either JSON lines or ndjson. Both are slightly different ways of encoding multiple objects with one JSON text per line, with specific restrictions on those texts and the way they're encoded and the whitespace between them.

So, while a quick&dirty parser like this will probably work:

with open('tweets.txt') as f:
    for line in f:
        tweet = json.loads(line)
        dosomething(tweet)

You probably want to use a library like jsonlines:

with jsonlines.open('tweets.txt') as f:
    for tweet in f:
        dosomething(tweet)

The fact that the quick&dirty parser works on JSON lines is, of course, part of the point of that format—but if you don't actually know whether you have JSON lines or not, you're better off making sure.

abarnert
  • 354,177
  • 51
  • 601
  • 671
0

Since your input looks like Python expressions, I'd use ast.literal_eval to parse them.

Here is an example:

import ast

with open('tweets.txt') as fp:
    tweets = [ast.literal_eval(line)[1] for line in fp]

print(tweets)

Output:

['we break dance not hearts by Short Stack is my ringtone.... i LOVE that !!!.....\n', 'I want to write a . I think I will.\n', '@va_stress broke my twitter..\n', '" "Y must people insist on talking about stupid politics on the comments of a bubblegum pop . Sorry\n', 'aww great  "Picture to burn"\n', '@jessdelight I just played ur joint two s ago. Everyone in studio was feeling it!\n', 'http://img207.imageshack.us/my.php?image=wpcl10670s.jpg her s are so perfect.\n', 'cannot hear the new  due to geographic location. i am geographically undesirable. and tune-less\n', '" couples in public\n', "damn wendy's commerical got that damn  in my head.\n", 'i swear to cheese & crackers @zyuuup is in Detroit like every 2 months & i NEVER get to see him!  i swear this blows monkeyballs!\n', '" getting ready for school. after i print out this\n']
Robᵩ
  • 163,533
  • 20
  • 239
  • 308