0

How can I change the definitions of words on Mac ver. MeCab? I'm analyzing text data on R but some letters (such as , ", () etc.) are not defined as symbols but as nouns. So I can't exclude these letters because some letters are not treated as characters. For example I want to execute codes as below

df <- df %>%
 dplyr::filter(! TERM %in% c("\", """)) # TERM is the variable name.

but this doesn't work since I can't apply "" to these letters.

So I need to change the definitions of these symbol letters in MeCab dictionary. But I don't know how to do it. This problem must be very elementary but I'm afraid I do not understand the way to open and edit files with Mac terminal.

P.S. Replication data

# code
dput(pilot_data[1:10, "IMAGE_total"])

# output
structure(list(IMAGE_total = c("遠い 難しい 不安", 
"国民を動かす討論 世の中を平和に維持する大切なもの 選挙するもの", 
"苛立ちの対象だ。 不快なものだ。 悲しいものだ。", 
"身近ではない 必要ない 茶番劇である", "難しい物 遠い存在 高みの見物的な物", 
"汚いもの 興味深いもの 信用できないもの", 
"ダーティーな行為だ うさんくさい世界だ できればかかわりたくないことだ", 
"意味がない 敵である 興味がないもの", 
"生活に影響してくるもの。 きまりごとをつくるところ 縁のない世界", 
"国会議員のもの。 くだらない世界。 金にまみれた世界。"
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))

pilot_data is the data frame and IMAGE_total is the answer of open-ended survey question. These Japanese characters may be difficult to treat and I apologize.

Ashu
  • 111
  • 5
  • If you're trying to filter this way, I assume you only have this in the field? Either way, you can try something like `filter(!str_detect(TERM, "\\("))` where the double backslash escapes the symbolic use. If that doesn't work, adding a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) is the best way to get great answers quickly. – Kat Aug 21 '21 at 20:51
  • Thank you for your advice, Kat. But it was only partially effective. I mainly need to exclude single backslash and double quotation but `str_detect()` function doesn't recognize these as characters either...So I will give you replication data. – Ashu Aug 22 '21 at 01:37
  • I don't know about MeCab, but if you're filtering strings that contain only a double quote or backslash the pattern vector would be `c("\\", "\"")`. R will throw an error with the pattern in your example code. If you're filtering all strings containing either of these characters or parentheses, use something like @Kat suggested with a regex class, e.g., `filter(!str_detect(TERM, '[\\\\"()]'))`. – ngwalton Aug 22 '21 at 03:39

1 Answers1

0

With codes below, I can run the analysis.

df <- df %>%
 dplyr::filter(!str_detect(TERM, '[\\\\"()]')) 

Thank you for your help.

Ashu
  • 111
  • 5