How to split items in list?

Question

I'm trying to scrape information from a website. I put the info into a list, but when I print the list, it looks something like this:

list =  ['text  \n\n  more text (1)  \n\n  even more text  \n\n']

As you can see, nothing is separated. I want the list to look something like this:

list = ['text','more text (1)', 'even more text']

I tried doing list = [i.split('\n\n') for i in list] but that didn't work. The result was :

list = [text  ','  more text (1)  ','  even more text]

How can I fix this?

Thank you in advance for taking the time to read my question and help in any way you can. I appreciate it

If your issue is the extra space, you can use the `strip()` method — Ben Grossmann, Feb 24 '22 at 17:56
Looks like you already got the right result, you just need to include `.strip()` in your code — G. Anderson, Feb 24 '22 at 17:56
Should show the "put the info into a list" code if it isn't doing what you want. — Mark Tolonen, Feb 24 '22 at 17:56
don't call a list `list`. then take a look at `.split()` https://stackoverflow.com/questions/6696027/how-to-split-elements-of-a-list — , Feb 24 '22 at 17:56

PrinsEdje80 · Answer 1 · 2022-02-24T20:03:48.247

1

You're almost there... If you to the following you should be there:

the_list = ['text  \n\n  more text (1)  \n\n  even more text  \n\n']
final_list = list(filter(None, [i.strip() for i in the_list[0].split('\n\n')]))

The reason why it failed in my previous answer was that we defined the_list as a list of length 1. Secondly, I put the split in the wrong location.

I've also added the filter to "squeeze" an empty result at the end in case you want to remove those.

edited Feb 24 '22 at 20:03

answered Feb 24 '22 at 17:56

PrinsEdje80

494
4
8

thank you for taking the time to answer my question! I got this error: `AttributeError: 'list' object has no attribute 'strip'`. I tried `final_list = [i.strip()split('\n\n') for i in list]' instead. The problem is, when I convert this to a dataframe, it still has everything in 1 row and doesn't split the text into rows – shorttriptomars Feb 24 '22 at 18:09
Oops. I've done something wrong. I'll fix it. – PrinsEdje80 Feb 24 '22 at 19:58

score 1 · Answer 2 · answered Feb 24 '22 at 18:14

Here is a way to do it. I first split each string of your list and then remove any trailing or leading space using the split method.

info = []
for i in liste:
    if i[-2:] == "\n\n":
        i = i[:-2]
    untrimmed = i.split("\n\n")
    trimmed = [j.strip() for j in untrimmed]
    info.append(trimmed)

The if statement permits to get rid of any empty string if your input ends with "\n\n".

Sathi Aiswarya · Answer 3 · 2022-03-02T04:51:58.400

1

I tried this code it's worked for me may be it helps you.

lst =  "text  \n\n  more text (1)  \n\n  even more text"

x=lst.split("\n\n")

print("list=",x)

edited Mar 02 '22 at 04:51

answered Feb 24 '22 at 18:14

Sathi Aiswarya

2,068
2
11

score 1 · Accepted Answer · answered Feb 24 '22 at 18:17

Try this code maybe:

import re
list =  ['text  \n\n  more text (1)  \n\n  even more text  \n\n']
list[0] = list[0].replace('  \n\n  ', '#').replace('  \n\n', '#')
list = re.split('#',list[0])

if list[len(list) - 1] == '':
  list.pop(len(list) - 1)

print(list)

Output:

['text', 'more text (1)', 'even more text']

First we replace every instance of ' \n\n ' and ' \n\n' with '#'. This is because even though the elements are separated by ' \n\n ', the code ends without a space after it, so we need a unique separator for that instance.

Afterwards, we split the list by every instance of '#', and pop the final element if it was a black space caused by an ending ' \n\n ' or ' \n\n '.

I hope this helped! Please let me know if you need any further clarification or details :)

This definitely helped! Thank you so much for taking the time to answer my question! I really appreciate it! — shorttriptomars, Feb 24 '22 at 18:21

score 1 · Answer 5 · answered Feb 24 '22 at 18:21

list1 =  ['text  \n\n  more text (1)  \n\n  even more text  \n\n']
print(list1)
list1
joined = "".join(list1)
joined = joined.replace('\n\n',',')
words = [x.strip() for x in joined.split(',')]
print(words)
while("" in words) :
    words.remove("")
print(words)

score 1 · Answer 6 · answered Feb 24 '22 at 18:22

1

Please, try this:

list =  ['text  \n\n  more text (1)  \n\n  even more text  \n\n']
aux = lista[0].split('\n\n')
list_final = [e.strip() for e in aux]
list_final.remove('')

answered Feb 24 '22 at 18:22

RubyLearning

83
1
7

How to split items in list?

6 Answers6