I have a list of strings and I want to use regex to filter the list to certain strings.
Ex. Here is the original list:
quoteTitle = ['\r\n ', ' ', '\r\n ', '\r\n ', '\r\n ', '\r\n ', '\r\n ', '30. Loyalty', '29. Speed Scale', '28. Security', '27. Every Position', '26. Superior Brain Power', '25. A Long Line of Fighters', '24. Dwight Surveillance', '23. Friends ', '22. Pull the Plug ', '21. Second Life', '20. Accidentally vs. On Purpose', '19. Menstruation Wishes ', '18. Ideal Choice', '17. Healthcare in the Wild', '16. Superior Cousins', '15. Regular Ideas', '14. Immunity Logic', '13. The Person You Least, Medium and Most Suspect', '12. Real Heroes ', '11. Water Cooler Gossip', '10. Stress', '9. All These People!', '8. The “R” Sound', '7. A Woman’s Defects ', '6.Werewolf Hunting Experience ', '5. An Ideal World ', '4. Attention', '3. The Thing About Bear Attacks ', '2. Resume Critiquing', '1. Yeast Infections ', 'Tags', 'Recently in TV', '5/8/2018 3:45:00 PM', '5/3/2018 2:00:00 PM', '5/3/2018 12:00:00 PM', 'Most Popular', '5/3/2018 11:00:00 AM', '5/3/2018 12:00:00 PM', '5/8/2018 3:45:00 PM', '4/13/2018 5:00:00 PM', '4/18/2018 8:22:31 PM', '5/3/2018 2:00:00 PM', '5/8/2018 6:02:54 PM', '5/7/2018 4:52:04 PM', '5/7/2018 2:57:00 PM', '5/3/2018 5:04:43 PM', '5/3/2018 1:06:18 PM', 'Music', '5/3/2018 12:00:00 AM', 'Music', '5/3/2018 12:00:00 AM', 'Music', '5/2/2018 12:00:00 AM', '11/28/2017 8:00:00 AM', '11/30/2017 11:00:00 AM', '5/3/2018 11:00:00 AM', '5/3/2018 2:00:00 PM', '5/3/2018 12:00:00 PM', '12/11/2017 1:00:00 PM', '12/15/2017 8:00:00 AM', '1/9/2018 3:00:00 PM', '1/4/2017 12:30:00 PM', '4/13/2013 12:13:00 PM', 'TV', '4/3/2018 10:00:00 AM', 'TV', '4/3/2018 9:25:00 AM', 'Comedy', '3/22/2018 1:00:28 PM', 'TV', '3/15/2018 10:00:00 AM', 'Comedy', '3/13/2018 2:00:00 PM', 'TV', '3/10/2018 10:00:00 AM', 'TV', '3/2/2018 11:00:00 AM', 'TV', '2/25/2018 10:30:00 PM', 'TV', '2/23/2018 1:00:00 PM', '5/3/2018 11:00:00 AM', '5/3/2018 12:00:00 PM', '5/8/2018 3:45:00 PM', '4/13/2018 5:00:00 PM', '4/18/2018 8:22:31 PM', '5/3/2018 2:00:00 PM', '5/3/2018 10:00:00 AM', '5/7/2018 10:00:00 AM', '4/26/2018 2:00:00 PM', '5/6/2018 10:00:00 PM', '5/8/2018 6:02:54 PM', '5/7/2018 4:52:04 PM', '5/7/2018 2:57:00 PM', '5/3/2018 5:04:43 PM', '5/3/2018 1:06:18 PM', '5/3/2018 12:00:00 AM', '5/3/2018 12:00:00 AM', '5/2/2018 12:00:00 AM', '5/2/2018 12:00:00 AM', '5/2/2018 12:00:00 AM', '11/28/2017 8:00:00 AM', '11/30/2017 11:00:00 AM', '5/3/2018 11:00:00 AM', '5/3/2018 2:00:00 PM', '5/3/2018 12:00:00 PM', '12/11/2017 1:00:00 PM', '12/15/2017 8:00:00 AM', '1/9/2018 3:00:00 PM', '1/4/2017 12:30:00 PM', '4/13/2013 12:13:00 PM', '4/3/2018 10:00:00 AM', '4/3/2018 9:25:00 AM', '3/22/2018 1:00:28 PM', '3/15/2018 10:00:00 AM', '3/13/2018 2:00:00 PM']
I want only the numbered items and their text following from 30 to 1. I can successfully filter out anything that doesn't start with a number using
p = re.compile(r'\w')
q = filter(p.match, quoteTitle)
p = re.compile(r'^\d+')
q = filter(p.match, q)
This gets me to
print(list(q)) --> ['30. Loyalty', '29. Speed Scale', '28. Security', '27. Every Position', '26. Superior Brain Power', '25. A Long Line of Fighters', '24. Dwight Surveillance', '23. Friends ', '22. Pull the Plug ', '21. Second Life', '20. Accidentally vs. On Purpose', '19. Menstruation Wishes ', '18. Ideal Choice', '17. Healthcare in the Wild', '16. Superior Cousins', '15. Regular Ideas', '14. Immunity Logic', '13. The Person You Least, Medium and Most Suspect', '12. Real Heroes ', '11. Water Cooler Gossip', '10. Stress', '9. All These People!', '8. The “R” Sound', '7. A Woman’s Defects ', '6.Werewolf Hunting Experience ', '5. An Ideal World ', '4. Attention', '3. The Thing About Bear Attacks ', '2. Resume Critiquing', '1. Yeast Infections ', 'Tags', 'Recently in TV', '5/8/2018 3:45:00 PM', '5/3/2018 2:00:00 PM', '5/3/2018 12:00:00 PM', 'Most Popular', '5/3/2018 11:00:00 AM', '5/3/2018 12:00:00 PM', '5/8/2018 3:45:00 PM', '4/13/2018 5:00:00 PM', '4/18/2018 8:22:31 PM', '5/3/2018 2:00:00 PM', '5/8/2018 6:02:54 PM', '5/7/2018 4:52:04 PM', '5/7/2018 2:57:00 PM', '5/3/2018 5:04:43 PM', '5/3/2018 1:06:18 PM', 'Music', '5/3/2018 12:00:00 AM', 'Music', '5/3/2018 12:00:00 AM', 'Music', '5/2/2018 12:00:00 AM', '11/28/2017 8:00:00 AM', '11/30/2017 11:00:00 AM', '5/3/2018 11:00:00 AM', '5/3/2018 2:00:00 PM', '5/3/2018 12:00:00 PM', '12/11/2017 1:00:00 PM', '12/15/2017 8:00:00 AM', '1/9/2018 3:00:00 PM', '1/4/2017 12:30:00 PM', '4/13/2013 12:13:00 PM', 'TV', '4/3/2018 10:00:00 AM', 'TV', '4/3/2018 9:25:00 AM', 'Comedy', '3/22/2018 1:00:28 PM', 'TV', '3/15/2018 10:00:00 AM', 'Comedy', '3/13/2018 2:00:00 PM', 'TV', '3/10/2018 10:00:00 AM', 'TV', '3/2/2018 11:00:00 AM', 'TV', '2/25/2018 10:30:00 PM', 'TV', '2/23/2018 1:00:00 PM', '5/3/2018 11:00:00 AM', '5/3/2018 12:00:00 PM', '5/8/2018 3:45:00 PM', '4/13/2018 5:00:00 PM', '4/18/2018 8:22:31 PM', '5/3/2018 2:00:00 PM', '5/3/2018 10:00:00 AM', '5/7/2018 10:00:00 AM', '4/26/2018 2:00:00 PM', '5/6/2018 10:00:00 PM', '5/8/2018 6:02:54 PM', '5/7/2018 4:52:04 PM', '5/7/2018 2:57:00 PM', '5/3/2018 5:04:43 PM', '5/3/2018 1:06:18 PM', '5/3/2018 12:00:00 AM', '5/3/2018 12:00:00 AM', '5/2/2018 12:00:00 AM', '5/2/2018 12:00:00 AM', '5/2/2018 12:00:00 AM', '11/28/2017 8:00:00 AM', '11/30/2017 11:00:00 AM', '5/3/2018 11:00:00 AM', '5/3/2018 2:00:00 PM', '5/3/2018 12:00:00 PM', '12/11/2017 1:00:00 PM', '12/15/2017 8:00:00 AM', '1/9/2018 3:00:00 PM', '1/4/2017 12:30:00 PM', '4/13/2013 12:13:00 PM', '4/3/2018 10:00:00 AM', '4/3/2018 9:25:00 AM', '3/22/2018 1:00:28 PM', '3/15/2018 10:00:00 AM', '3/13/2018 2:00:00 PM']
Now I want to remove the dates in the list
I've tried a lot of combinations of this, but I think I'm missing something or not understanding. My thinking is to get all strings in the list that do not follow the format of the date entries.
p = re.compile(r'[^'\d+/]')
q = filter(p.match, q)
They start with an apostrophe because its a string of a quote and I think that might be my problem. Other than that, the format goes:
apostrophe, number (between 1-12 so \d+), /
That should be enough to filter out the date entries as long as I get it working correctly
Update: even tried this to search for elements of the list that have an AM or PM in them and still no luck
p = re.compile(r'[^(AM|PM)]')
q = filter(p.search, q)