0

I have a long string composed of a youtube video transcript named data. I have a csv of videos with these transcripts implemented in a column named "subtitle". TLDR I want to find the row(video) which contains this string.

My code:

if dfSubtitles['subtitle'].str.contains(data,regex=False).any():
    print('exist')
else:
    print('not exist')

Currently, my code only checks if the string exists in the data frame, and it was successfully able to identify true. However, every attempt at retrieving the row has been giving me issues because the following returns a boolean value. How can I retrieve the row where my string exists?

No this solution does not work:

print(dfSubtitles[dfSubtitles['subtitle'].str.contains(data)])

The following outputs:

Empty DataFrame
Columns: [Unnamed: 0, categoryName, categoryId, channel, videoId, subtitle]
Index: []
exist

Why is it that my code finds the instance of my string but outputs an empty dataframe?

Community
  • 1
  • 1
Alan Tram
  • 21
  • 1
  • 5
  • @jezrael Your link does not solve the problem I'm having. Is it ok if you open my question again please? – Alan Tram Feb 24 '20 at 06:54
  • Maybe you forget `,regex=False`, `print(dfSubtitles[dfSubtitles['subtitle'].str.contains(data, regex=False)])` – jezrael Feb 24 '20 at 06:55
  • Because if working `if dfSubtitles['subtitle'].str.contains(data,regex=False).any(): print('exist')` well (match at least one value) then has to working `print(dfSubtitles[dfSubtitles['subtitle'].str.contains(data, regex=False)])` – jezrael Feb 24 '20 at 06:56
  • @jezrael That worked to a degree. It outputs: Unnamed: 0 ...subtitle 407 407 ... The universe is bustling with matter and energy. Why is it not printing a normal row? – Alan Tram Feb 24 '20 at 07:07
  • It seems some data related problem, hard to know without data. – jezrael Feb 24 '20 at 07:10

0 Answers0