1

I want to write a definition that take in a string (a tweet) and creates a 2 column dataframe with hashtags in 1 column and @ mentions on the other one.

I am using split right now, but would like something like:
(psedocode)

string.split("@"||"#", if "#" assign to column1 else column2)

I know in R you can do something similar with "which", but dont know how to do it here.

Thanks

PS. I have the dataset with all the tweets downloaded in a txt file.

Til
  • 5,150
  • 13
  • 26
  • 34
  • 2
    How do you want to handle multiple hashtags or @s? – mfitzp Feb 24 '19 at 19:31
  • 1
    Welcome to Stack Overflow! Could you edit into your question some example data (as text, not images), the expected output that would result from that example data, and your attempts so far? That would really help readers experiment toward solutions. – Peter Leimbigler Feb 24 '19 at 19:36
  • 1
    Why has this been marked as a duplicate? The supposed duplicate question doesn't really answer this question. – Toby Petty Feb 24 '19 at 21:31

1 Answers1

1

The simplest approach is to just 2 tests in a loop:

hashtags = []
users = []

# Split tweet on whitepace into words
for word in tweet:
    if word.startswith('#'):
        hashtags.append(word)
    elif word.startswith('@'):
        users.append(word)

(Replacing the lists and appends with dataframe manipulations as desired).

match
  • 10,388
  • 3
  • 23
  • 41