1

Here's the regex that I'm using for hashtag extraction

def extract_hashtags
       hashtag_regex = /\B#(\w+)/i
       text_hashtags = content.scan(hashtag_regex)
       text_hashtags.each do |tag|
         hashtags.create hashtags: tag
       end
     end

Using /\B#(\w+)/i, leaves this in the front of the data

For example, the extraction should be "abcd", but it is saved as "--- - abcd"

What should the regex be change to in order to extract just the #abcd?

If the post content (where the hashtag is extracted) is something like "Hello stackoverflow #stackoverflow", it gets saved into the database as "-- - stackoverflow"

user2159586
  • 203
  • 5
  • 16

2 Answers2

4
test = "Hello stackoverflow #stackoverflow"
test.scan( /\B#(\w+)/i )
 => [["stackoverflow"]]

I suspect you are storing the array ["stackoverflow"], and from the resulting string, is your storage using YAML to handle structured data?

I think you just want to alter the create line:

text_hashtags.each do |tag|
  hashtags.create hashtags: tag[0]
end
Neil Slater
  • 26,512
  • 6
  • 76
  • 94
2

The "--- -" is prepended by the database layer of Rails when converting the array to YAML. "---" is the YAML prefix, and "-" indicates the first element of the array.

When you read it back from the database, Rails will do the inverse transformation: it will rebuild the original array, and remove the dashes.

pts
  • 80,836
  • 20
  • 110
  • 183