How do I match the corpus file with second column in stem and return the first column?
corpus.txt
this
is
broken
testing
as
told
Only the fist 2 columns are important in this file:
stem.csv
"test";"tested";"test";"Suffix";"A";"7673";"321: 0 xxx"
"test";"testing";"test";"Suffix";"A";"7673";"322: 0 xxx"
"test";"tests";"test";"Suffix";"b";"5942";"001: 0 xxx"
"break";"broke";"break";"Suffix";"b";"5942";"002: 0 xxx"
"break";"broken";"break";"Suffix";"b";"5942";"003: 0 xxx"
"break";"breaks";"break";"Suffix";"c";"5778";"001: 0 xxx"
"tell";"told";"tell";"Suffix";"c";"5778";"002: 0 xx"
If the word is missing in the stem file, it should be replaced with XXX
expected.txt
XXX
XXX
break
test
XXX
tell
It can be done using SQL queries like this...
CREATE TABLE `stem` (
`column1` varchar(100) DEFAULT NULL,
`column2` varchar(100) DEFAULT NULL
) ;
INSERT INTO `stem` VALUES ('break','broken'),('break','breaks'),('test','tests');
CREATE TABLE `corpus` (
`column1` varchar(100) DEFAULT NULL
)
INSERT INTO `corpus` VALUES ('tests'),('xyz');
_____
mysql> select ifnull(b.column1, 'XXX') as result from corpus as a left join stem as b on a.column1 = b.column2;
+--------+
| result |
+--------+
| test |
| XXX |
+--------+
But I am looking for a way to process text files directly so that I do not need to import them in mysql.