I'm preparing text for a word cloud, but I get stuck.
I need to remove all digits, all signs like . , - ? = / ! @ etc., but I don't know how. I don't want to replace again and again. Is there a method for that?
Here is my concept and what I have to do:
- Concatenate texts in one string
- Set chars to lowercase <--- I'm here
- Now I want to delete specific signs and divide the text into words (list)
- calculate freq of words
- next do the stopwords script...
abstracts_list = open('new','r')
abstracts = []
allab = ''
for ab in abstracts_list:
abstracts.append(ab)
for ab in abstracts:
allab += ab
Lower = allab.lower()
Text example:
MicroRNAs (miRNAs) are a class of noncoding RNA molecules approximately 19 to 25 nucleotides in length that downregulate the expression of target genes at the post-transcriptional level by binding to the 3'-untranslated region (3'-UTR). Epstein-Barr virus (EBV) generates at least 44 miRNAs, but the functions of most of these miRNAs have not yet been identified. Previously, we reported BRUCE as a target of miR-BART15-3p, a miRNA produced by EBV, but our data suggested that there might be other apoptosis-associated target genes of miR-BART15-3p. Thus, in this study, we searched for new target genes of miR-BART15-3p using in silico analyses. We found a possible seed match site in the 3'-UTR of Tax1-binding protein 1 (TAX1BP1). The luciferase activity of a reporter vector including the 3'-UTR of TAX1BP1 was decreased by miR-BART15-3p. MiR-BART15-3p downregulated the expression of TAX1BP1 mRNA and protein in AGS cells, while an inhibitor against miR-BART15-3p upregulated the expression of TAX1BP1 mRNA and protein in AGS-EBV cells. Mir-BART15-3p modulated NF-κB activity in gastric cancer cell lines. Moreover, miR-BART15-3p strongly promoted chemosensitivity to 5-fluorouracil (5-FU). Our results suggest that miR-BART15-3p targets the anti-apoptotic TAX1BP1 gene in cancer cells, causing increased apoptosis and chemosensitivity to 5-FU.