Removing '\n' from scraped data in Python

Question

Im scraping repo's names from Github like:

repositorys = []
for ulr in user_repo_url: # in this list I have url like ('https://github.com/USER/?tab=repositories)
    source = urllib.request.urlopen(url).read()
    soup = bs.BeautifulSoup(source,'lxml')
    repos = [repo.text for repo in soup.find_all('div',class_='d-inline-block mb-1')]
    repositorys.append(repos)

return render(request,'file.html',{'repositorys':repositorys})

Im using Django and everything works, but insted of getting clear text I get name and '\n' symbols. I was trying using strip and map function but they didn't work. Do you have any other suggestions why doesn't it work?

`repo.text.strip()` ? BS has also `repo.get_text(strip=True)` to remove `\n` between some elements. — furas, Feb 11 '20 at 22:55
@furas yes - it's strange but it was working only for hardcoded URL and only if this URL is not stored in list — Frendom, Feb 11 '20 at 23:00
Does this answer your question? [Remove all newlines from inside a string](https://stackoverflow.com/questions/13298907/remove-all-newlines-from-inside-a-string) — s3cur3, Feb 11 '20 at 23:04

score 2 · Accepted Answer · answered Feb 11 '20 at 22:56

2

If your goal is to simply remove all occurrences of \n you can instead use repo.text.replace('\\n', '') . Note that you must escape the \ if that is literally the character in your string, otherwise, leave it as repo.text.replace('\n', '') if you are removing newlines.

answered Feb 11 '20 at 22:56

Cory Nezin

1,551
10
22

It works - no idea why it was treated like a hardcoded symbol – Frendom Feb 11 '20 at 23:24

Removing '\n' from scraped data in Python

1 Answers1