2

I am trying to exclude certain strings in the list of strings if the string includes certain words.

For example, if there is a word, "cinnamon" or "fruit" or "eat", in the string, I hope to exclude it from the list of strings.

['RT @haussera: Access to Apple Pay customer data, no, but another way? everybody wins - MarketWatch http://t.co/Fm3LE2iTkY', "Landed in the US, tired w horrible migrane. The only thing helping- Connie's new song on repeat. #SoGood #Nashville https://t.co/AscR4VUkMP", 'I wish jacob would be my cinnamon apple', "I've collected 9,112 gold coins! http://t.co/T62o8NoP09 #iphone, #iphonegames, #gameinsight", 'HAHAHA THEY USED THE SAME ARTICLE AS INDEPENDENT http://t.co/mC7nfnhqSw', '@hot1079atl Let me know what you think of the new single "Mirage "\nhttps://t.co/k8DJ7oxkyg', 'RT @SWNProductions: Hey All so we have a new iTunes listing due to our old one getting messed up please resubscribe via the following https…', 'Shawty go them apple bottoms jeans and the boots with the furrrr with furrrr the whole club is looking at her', 'I highly recommend you use MyMedia - a powerfull download manager for the iPhone/iPad.  http://t.co/TWmYhgKwBH', 'Alusckが失われた時間の異常を解消しました http://t.co/peYgajYvQY http://t.co/sN3jAJnd1I', 'Театр радует туземцев! Теперь мой остров стал еще круче! http://t.co/EApBrIGghO #iphone, #iphonegames, #gameinsight', 'RT @AppIeOfficiel: Our iPhone 7    http://t.co/d2vCOCOTqt', 'Я выполнил задание "Подключаем резервы"! Заходите ко мне в гости! http://t.co/ZReExwwbxh #iphone #iphonegames #gameinsight', "RT @Louis_Tomlinson: @JennSelby Google 'original apple logo' and you will see the one printed on my shirt that you reported on. Trying to l…", "I've collected 4,100 gold coins! http://t.co/JZLQJdRtLG #iphone, #iphonegames, #gameinsight", "I've collected 28,800 gold coins! http://t.co/r3qXNHwUdp #iphone, #iphonegames, #gameinsight", 'RT @AppIeOfficiel: Our iPhone 7    http://t.co/d2vCOCOTqt']

keywordFilter=['eat','cinnamon','fruit']
for sent in list:
    for word in keywordFilter:
        if word in sent:
            list.remove(sent)

But it does not filter the keyword that I hope and return the original list. Does anyone have idea why?

1st Edit:

import json
from json import *

tweets=[]

for line in open('apple.json'):
    try:
        tweets.append(json.loads(line))
    except:
        pass

keywordFilter=set(['pie','juice','cinnamon'])

for tweet in tweets:
    for key, value in tweet.items():
        if key=='text':
            tweetsF.append(value)

print(type(tweetsF))
print(len(tweetsF))

tweetsFBK=[sent for sent in tweetsF if not any(word in sent for word in keywordFilter)]
print(type(tweetsFBK))    
print(len(tweetsFBK))

Above is the code I have so far. Up to tweetsF, string is stored well and I have tried to exclude the words by using keywordFilter.

However tweetsFBK returns me 0 (nothing). Does anyone have any idea why?

Arnold Chung
  • 125
  • 2
  • 14
  • stack overflow think I am the robot hahahahaha – Arnold Chung Feb 17 '15 at 16:16
  • You just want to remove that keyword from the string? Or if the string has a keyword, remove the whole string? – Cory Kramer Feb 17 '15 at 16:17
  • @Cyber the string that has the keyword. – Arnold Chung Feb 17 '15 at 16:18
  • @Cyber I have bunch of strings and I hope to exclude strings that have certain keywords that I want to exclude. – Arnold Chung Feb 17 '15 at 16:18
  • @ArnoldChung you should avoid naming your variables 'list' since that is a [built-in](https://docs.python.org/2/library/functions.html) function. Also, you should avoid modifying a list while iterating it. See [here](http://stackoverflow.com/questions/1207406/remove-items-from-a-list-while-iterating-in-python). – nullstellensatz Feb 17 '15 at 16:22
  • All of this seems rather underspecified to me: What is a "word" ? A space separated sequence of letters only ? Do you count `Connie's` as one or two words ? Are your words "case sensitive" or not ? Is `US` the same word as `us` ? How long are both the original list and the list of weywords (i.e.: is an O(n²) solution acceptable) ... – Sylvain Leroux Feb 17 '15 at 16:37
  • @nullstellensatz Yes, in my python file, I used different name, but to post at here I changed it. I used to use it lst. Thanks for your comment! – Arnold Chung Feb 17 '15 at 18:04

3 Answers3

8

One solution is the following:

list = [sent for sent in list 
    if not any(word in sent for word in keywordFilter)]

It will remove all strings that contain one of the words in the list keywordFilter as a substring. For instance, it will remove the second string, since it contains the word repeat (and eat is a substring of repeat).

If you want to avoid this, you can do the following:

list = [sent for sent in list 
    if not any(word in sent.split(' ') for word in keywordFilter)]

It will remove only strings containing one of the words in the list keywordFilter as a subword (i.e. delimited by spaces in the sentence).

Tom Cornebize
  • 1,362
  • 15
  • 33
3

You can use any in a list comprehension to filter for you

original_list = ['RT @haussera: Access to Apple Pay customer data, no, but another way? everybody wins - MarketWatch http://t.co/Fm3LE2iTkY', "Landed in the US, tired w horrible migrane. The only thing helping- Connie's new song on repeat. #SoGood #Nashville https://t.co/AscR4VUkMP", 'I wish jacob would be my cinnamon apple', "I've collected 9,112 gold coins! http://t.co/T62o8NoP09 #iphone, #iphonegames, #gameinsight", 'HAHAHA THEY USED THE SAME ARTICLE AS INDEPENDENT http://t.co/mC7nfnhqSw', '@hot1079atl Let me know what you think of the new single "Mirage "\nhttps://t.co/k8DJ7oxkyg', 'RT @SWNProductions: Hey All so we have a new iTunes listing due to our old one getting messed up please resubscribe via the following https…', 'Shawty go them apple bottoms jeans and the boots with the furrrr with furrrr the whole club is looking at her', 'I highly recommend you use MyMedia - a powerfull download manager for the iPhone/iPad.  http://t.co/TWmYhgKwBH', 'Alusckが失われた時間の異常を解消しました http://t.co/peYgajYvQY http://t.co/sN3jAJnd1I', 'Театр радует туземцев! Теперь мой остров стал еще круче! http://t.co/EApBrIGghO #iphone, #iphonegames, #gameinsight', 'RT @AppIeOfficiel: Our iPhone 7    http://t.co/d2vCOCOTqt', 'Я выполнил задание "Подключаем резервы"! Заходите ко мне в гости! http://t.co/ZReExwwbxh #iphone #iphonegames #gameinsight', "RT @Louis_Tomlinson: @JennSelby Google 'original apple logo' and you will see the one printed on my shirt that you reported on. Trying to l…", "I've collected 4,100 gold coins! http://t.co/JZLQJdRtLG #iphone, #iphonegames, #gameinsight", "I've collected 28,800 gold coins! http://t.co/r3qXNHwUdp #iphone, #iphonegames, #gameinsight", 'RT @AppIeOfficiel: Our iPhone 7    http://t.co/d2vCOCOTqt']

keywordFilter = set(['eat','cinnamon','fruit'])

filtered_list = [str for str in originial_list if not any(i in str for i in keywordFilter)]
Cory Kramer
  • 114,268
  • 16
  • 167
  • 218
  • Thanks for your code. I have a quick question to ask. In python3, your code perfectly excludes the keyword from string. However, running the code on terminal does not work. In other words, let's say I saved this code on test.py. I launched the terminal and typed "python3 test.py" and run it. It returns me [ ], empty string. Do you know why? – Arnold Chung Feb 17 '15 at 18:08
0

Simply complicated :)

final_list = []
for i in original_list:
    temp = []
    for k in i.split(" "):
        if not any(i for i in keywordFilter if i in k):
            temp.append(k)
    final_list.append(" ".join(temp))
print final_list
itzMEonTV
  • 19,851
  • 4
  • 39
  • 49