5

I have a table which stores information of a lot of twitter tweets including the tweet text and the screen name of the user who tweeted the tweet. The tweets contain hashtags (starting with #), I want to count the number of hashtags that a specific user has tweeted:

tweet_id |                       tweet_text                           | screen_name    |
--------------------------------------------------------------------------------------------
       1 | #hashtag1 #otherhashtag2 #hashtag3 some more text          | tweeter_user_1 |
       2 | some text #hashtag1 #hashtag4 more text                    | tweeter_user_2 |
       3 | #hashtag5 #hashtag1 @not a hashtag some#nothashtag         | tweeter_user_1 |
       4 | #hashtag1 with more text                                   | tweeter_user_3 |
       5 | #otherhashtag2 #hashtag3,#hashtag4 more text               | tweeter_user_1 |

If I were to count the hashtags of tweeter_user_1, the result i expect is 8, if i wanted the hashtags of tweeter_user_3 it should return 1. How can I do it assuming that my table name is tweets.

I tried this: SELECT COUNT( * ) FROM tweets WHERE( LENGTH( REPLACE( tweet_text, '#%', '@') = 0 ) ) AND screen_name = 'tweeter_user_1' but it didn't work

I would be happy if the result of tweeter_user_1 was 9 too :D

adamdehaven
  • 5,890
  • 10
  • 61
  • 84
Tales
  • 1,829
  • 3
  • 33
  • 50
  • 2
    If it's just the `number` of hashtags you want, then something like: `SELECT SUM(LENGTH(tweet_text) - LENGTH(REPLACE(tweet_text, '#', ''))) FROM tweets WHERE screen_name = 'tweeter_user_1'` should work. – dash Oct 04 '12 at 22:01
  • In fact, you are going to have a difficult time not matching some#nothashtag - if possible @JYelton's suggestion is a good one - precalculate on storage - it's never going to change. – dash Oct 04 '12 at 22:10
  • Yeah, that's why i updated the post with "It's ok if i get a count of 9", JYelton idea is very good, but the table is already huge and i only needed this script to calculate the number of hashtags of one person that we detected abusing the system. Plus the application will be over in 3 or 4 weeks. – Tales Oct 05 '12 at 05:29
  • Possible duplicate of [Count the number of occurences of a string in a VARCHAR field?](http://stackoverflow.com/questions/12344795/count-the-number-of-occurences-of-a-string-in-a-varchar-field) – Ciro Santilli OurBigBook.com Nov 24 '15 at 21:26

2 Answers2

8

This should give you a list of screen_names and the total count of all hashtags they use.

SELECT  foo.screen_name, SUM(foo.counts) FROM 
  (
    SELECT screen_name, 
           LENGTH( tweet_text) - LENGTH(REPLACE(tweet_text, '#', '')) AS counts 
    FROM tweet_table 
  ) as foo 
GROUP BY  foo.screen_name

But.... it's a nasty query if the table is huge. I might specify a specific users in the inner select if you just need counts for a single user. Like this:

SELECT  foo.screen_name, SUM(foo.counts) FROM 
 (
    SELECT screen_name, 
         LENGTH( tweet_text) - LENGTH(REPLACE(tweet_text, '#', '')) AS counts 
    FROM tweet_table WHERE  screen_name = 'tweeter_user_1' 
 ) as foo 
GROUP BY  foo.screen_name
Ray
  • 40,256
  • 21
  • 101
  • 138
  • I think it has some kind of mistake, i keep getting an error message when I run this script: #1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'WHERE screen_name = 'LozanoXiomii') AS counts FROM tweet_table ) as foo GROUP – Tales Oct 05 '12 at 14:55
  • @Tales yup, put the where in the wrong place... updated query in exampl – Ray Oct 05 '12 at 15:44
2

Depending on how often you need to run the query, you could be causing MySQL to spend a lot of CPU time parsing and reparsing the tweet_text column. I would strongly recommend adding a hashtag_qty column (or similar) and store the count of hashtag elements there when you populate the row to begin with.

JYelton
  • 35,664
  • 27
  • 132
  • 191
  • Thanks for the suggestion, it is a very good idea and i will keep it for future reference as we probably will start a few more contests using tweeter hashtags, and our magazine has a high number of people following us. – Tales Oct 05 '12 at 05:32