1

I'm trying to parse through tweets which are stored in a column called "text" in a .csv file. I want to use regex, TweetTokenizer, etc., but that all requires that the text be in string form (as far as I understand).

I saw this post:

Parsing a tweet inside a csv column in Python

but for me, the code is too specific to finding hashtags. I do want to do that, but does anyone know how to more generally turn the text in that "text" column into strings so I can parse?

Thanks, punpun

gnpunpun
  • 21
  • 2
  • 1
    You should be able to extract the "text" column from the dataframe, save it as a list and parse the elements of the list. Unless I'm missing the point here. – fulaphex Apr 01 '19 at 23:50
  • @fulaphex do you know how to parse all the elements of the list at once? For example, running— re.findall(r"#(\w+)", tweetlist) —returns TypeError: expected string or bytes-like object. Basically I want to make one big string of all the tweets and be able to parse through that – gnpunpun Apr 03 '19 at 01:08
  • https://stackoverflow.com/a/34011944/11295826 This worked for me – gnpunpun Apr 03 '19 at 01:24

1 Answers1

0

Text columns should be imported as strings when you read the csv file:

df = pd.read_csv('tweet.csv')
print(df)

Output:

            user                                               text
0  scotthamilton  is upset that he can't update his Facebook by ...
1       mattycus  @Kenichan I dived many times for the ball. Man...
2        ElleCTF     my whole body feels itchy and like its on fire
3         Karoli  @nationwideclass no, it's not behaving at all....
4       joy_wolf                       @Kwesidei not the whole crew
5        mybirch                                         Need a hug
print(df.dtypes)

Output:

user    object
text    object
dtype: object

The Pandas object dtype is the same as Python str type, and is used for text.

If you do need to convert the column type to str, you can use the following:

df.text = df.text.astype(str)
Nathaniel
  • 3,230
  • 11
  • 18