0

Im scraping repo's names from Github like:

repositorys = []
for ulr in user_repo_url: # in this list I have url like ('https://github.com/USER/?tab=repositories)
    source = urllib.request.urlopen(url).read()
    soup = bs.BeautifulSoup(source,'lxml')
    repos = [repo.text for repo in soup.find_all('div',class_='d-inline-block mb-1')]
    repositorys.append(repos)

return render(request,'file.html',{'repositorys':repositorys})

Im using Django and everything works, but insted of getting clear text I get name and '\n' symbols. I was trying using strip and map function but they didn't work. Do you have any other suggestions why doesn't it work?

eyllanesc
  • 235,170
  • 19
  • 170
  • 241
Frendom
  • 508
  • 6
  • 24
  • 1
    `repo.text.strip()` ? BS has also `repo.get_text(strip=True)` to remove `\n` between some elements. – furas Feb 11 '20 at 22:55
  • @furas yes - it's strange but it was working only for hardcoded URL and only if this URL is not stored in list – Frendom Feb 11 '20 at 23:00
  • 2
    Does this answer your question? [Remove all newlines from inside a string](https://stackoverflow.com/questions/13298907/remove-all-newlines-from-inside-a-string) – s3cur3 Feb 11 '20 at 23:04

1 Answers1

2

If your goal is to simply remove all occurrences of \n you can instead use repo.text.replace('\\n', '') . Note that you must escape the \ if that is literally the character in your string, otherwise, leave it as repo.text.replace('\n', '') if you are removing newlines.

Cory Nezin
  • 1,551
  • 10
  • 22