0

I have a pandas dataframe which contains a column containing twitter profile descriptions. In some of these description, there are strings like 'insta: profile_name'.

How can I create a line of code which would search for a string (eg, 'insta:' or 'instagram:') and then return the rest of the string of whatever is next to it?

  1252: 'lad who loves to cook  • insta: xxx',
  1254: 'founder and head chef | insta: xxx |',
  1992: ' |bakery instagram - xxx',
  2291: 'insta: @xxx for enquiries'
  2336: 'self taught baker. ig:// xxxx ',
Jimmy K
  • 133
  • 11
  • can you please provide a sample of your dataset? – sophocles Apr 25 '21 at 15:05
  • Umm how do I do that! haha sorry. To give you an idea, the column contains cells which have something like 'Jimmy K. insta: @twittername find me on YouTube too'. – Jimmy K Apr 25 '21 at 15:09
  • If you have the data loaded in a pandas dataframe, you can use ```df.to_dict()``` and paste the output in your question so that we can replicate. You can find information on how to give good reproducible examples [here](https://stackoverflow.com/questions/63163251/pandas-how-to-easily-share-a-sample-dataframe-using-df-to-dict) – sophocles Apr 25 '21 at 15:10
  • Thanks, I have included a subset of anonymised data. My intention is to create a matching expression of 'insta:' and then using this logic create one for 'instagram:' and other terms I find. – Jimmy K Apr 25 '21 at 15:27
  • 2
    df['name'].str.extract(pat = r'(insta:|ig:)(.*)')[1].str.strip('\',') – Nk03 Apr 25 '21 at 15:33
  • I think the answer by Nk03 is what you need – sophocles Apr 25 '21 at 15:47
  • Thank you everyone! Would it be possible to walk me through that code @Nk03? – Jimmy K Apr 26 '21 at 08:01

2 Answers2

0

You can use Regex to match each of the keywords such as: Insta

The code should be something like this:

import re 
container = list()
for word in [list of keywords, ex: "insta","face"]:
    _tag = re.findall( word + 'Regex Syntax', the_string_to_find_from)
    container.append([word,_tag])

then you can unpack the resulted Container variable when you want to get the result. I can help you write the Regex syntax but I need more information on the way your required information is wrapped in the text.

Minh Quân
  • 36
  • 4
0

Answer provided by Nk03 in the comments:

df['name'].str.extract(pat = r'(insta:|ig:)(.*)')[1].str.strip('\',') 
Jimmy K
  • 133
  • 11