how to turn multiple lines into multiple lists in python?

Question

I have a file with lines look like this:

"[36.147315849999998, -86.7978174] 6 2011-08-28 19:45:11 @maryreynolds85 That is my life, lol."

"[37.715399429999998, -89.21166221] 6 2011-08-28 19:45:41 Ate more veggie and fruit than meat for the first time in my life"

i have tried to strip these lines and split them, then i tried to strip substring in every list with punctuations.

 with open('aabb.txt') as t:
        for Line in t:
            splitline = Line.strip()  
            splitline2 = splitline.split()  
            for words in splitline2:
                words = words.strip("!#$%&'()*+,-./:;?@[\]^_`{|}~")
                words = words.lower()

what shoul I do to turn these lines into two lists look like this:

'["36.147315849999998","-86.7978174","6","2011-08-28","19:45:11","maryreynolds85","that","is","my","life","lol"]'

'["37.715399429999998","-89.21166221","6","2011-08-28","19:45:41","ate","more","veggie","and","fruit","than","meat","for","the","time","in","my","life"]'

I don't know enough about python, but should you use something from this : [Read a file line-by-line with python](https://stackabuse.com/read-a-file-line-by-line-in-python/) and mix it with the function `list = line.split(" ")` — pensum, Nov 05 '19 at 05:11
You're trying to read a TSV (Tab-Separated Value) file, which generically refers to whitespace-separated input (not just tabs). It also contains `[...]` brackets. — smci, Nov 05 '19 at 05:11
Variable names should generally follow the `lowercase_with_underscores` style. — AMC, Nov 05 '19 at 05:13
Related: [parsing a tab-separated file in Python](https://stackoverflow.com/questions/11059390/parsing-a-tab-separated-file-in-python) — smci, Nov 05 '19 at 05:13
Where do these strings come from? What’s the general format, context, etc? — AMC, Nov 05 '19 at 05:21

Atreyagaurav · Answer 1 · 2019-11-05T05:18:14.493

2

are all your data in the same format? if yes, use regex from re library.

import re
your_str="[36.147315849999998, -86.7978174] 6 2011-08-28 19:45:11 @maryreynolds85 That is my life, lol."
reg_data= re.compile(r"\[(.*),(.*)\] (.*)")
your_reg_grp=re.match(reg_data,your_str)
if your_reg_grp:
  print(your_reg_grp.groups())

#this should put everything in the list except the parts outside the square brackets, you can split the last one by split(" ") then make a new list.

grp1=your_reg_grp.groups()
grp2=grp1[-1].split(" ")

Combine grp1[:-1] and grp2

edited Nov 05 '19 at 05:18

answered Nov 05 '19 at 05:11

Atreyagaurav

1,145
6
15

2

Adding to @Atreyagaurav, the following RegEx is more explicit: https://regex101.com/r/QRux5E/1 – jayg_code Nov 05 '19 at 05:18
1

Nice one, That seems to be useful, I didn't want to spend too much time in figuring the exact regex so I made a general one. – Atreyagaurav Nov 05 '19 at 05:20
@Atreyagaurav thank you for your help. I tried your answer but it seems like there are some puncuations are missed.like, ['6', '2011-08-28', '19:11:58', 'wahhhhhh', 'i', 'need', 'to', 'figure', 'out', 'what', 'to', 'do', 'wifff', 'my', 'life', '#lost']. the "#' infront of the word"lost' are supposed to be removed. could you show me how to solve the problem in that case? im kinda new to python. thank you for your help again. – jane998 Nov 09 '19 at 03:05
if such puntuation are in start or end, your code `words.strip("!#$%&'()*+,-./:;?@[\]^_`{|}~")` should work fine, use it for each item in your group, or write a function for that. If they can also be in the miiddle then you can write a function to remove those characters, shouldn't be hard. – Atreyagaurav Nov 10 '19 at 04:53

Anuj Dekavadiya · Answer 2 · 2019-11-07T04:12:11.970

You are already creating words that you need on the list. You have to just create a list and add it to the list.

with open('aabb.txt') as t:
        for Line in t:
            list=[]
            splitline = Line.strip()  
            splitline2 = splitline.split()  
            for words in splitline2:
                words = words.strip("!#$%&'()*+,-./:;?@[\]^_`{|}~")
                words = words.lower()
                list.append(words)
            print(list)

You can also create a list of list for each line and use it for your needs.

with open('aabb.txt') as t:
        root_list=[]
        for Line in t:
            temp_list=[]
            splitline = Line.strip()  
            splitline2 = splitline.split()  
            for words in splitline2:
                words = words.strip("!#$%&'()*+,-./:;?@[\]^_`{|}~")
                words = words.lower()
                temp_list.append(words)
            root_list.append(temp_list)
        print(root_list)

@Dulaj Kulathunga I have no idea that you have formated my code. When I edited it's still mashed up. — Anuj Dekavadiya, Nov 05 '19 at 06:21
@ Anuj Dekavadiya im kinda new to python, could you show me how to create a list of list in this case? — jane998, Nov 06 '19 at 23:00

how to turn multiple lines into multiple lists in python?

2 Answers2