0

I have a question pertaining to SQLite and Python and BeautifulSoup

I have some data that I scraped from the web and stored in an SqLite database. I use DB Browser to easily view the text and it looks perfectly clean. However, when I retrieve the text into python using cursor.fetchall() after selecting the column that I want and it is output as a tuple with one member in the format

('the text is here',)

I have noticed that whenever there is an apostrophe in the text itself Python automatically switches to display the text with double quotes like this:

("This sentence has it's quotes",)

The issue is is that when there is text that has both " " in the actual text AND also has ' in the actual text, Python escapes all ' characters in this fashion:

("This sentence\'s apostrophes will be annoyingly escaped",)

I am looking to do some NLP work with the text itself and I feel that my data is dirty because I have tried to output it so that it is just the raw text and alas the text permanently has the escape characters in it.

Should I go back a few steps and try another way of storing the data or is there a simple fix for this issue. I've done some digging and have been unable to find anything on this.

My end result is to have perfectly clean data that I can do some NLP research on without these \'s ruining the data.

Thanks

Kevin
  • 391
  • 3
  • 6
  • 22
  • 1
    You should extract the element (i.e. the actual str object) from the tuple before working with it. What you are seeing is the [repr](https://stackoverflow.com/questions/7784148/understanding-repr-function-in-python) format of the tuple with a single str, resulting in the quotes done like so. – metatoaster Apr 21 '17 at 03:53
  • I'm thinking that would help. I will do some research on that one! thanks! – Kevin Apr 21 '17 at 03:57
  • Thanks man your simple comment made me run a test on my code where I just did a fetchall and wrote the results to a file and all the escape characters are no longer there. The real reason they must have been there was in my haste to "clean" the entries I was manually using a regex to strip the parentheses of the tuples off each entry rather than just extracting their text. I did more work just to muddy my data. Frustrating haha. Still have a lot to learn. Thanks again! – Kevin Apr 21 '17 at 04:10

0 Answers0