Special characters problem in variable/file. Same string, different format from different sources

Question

I have an encoding problem i can't wrap my head around. I'm also fairly new to linux and bash, so bear with me.

Context/Example:

cat file1.txt
Foo ヅ

#file -i file1.txt: text/plain; charset=utf-8
#Source: website curl

cat file2.txt
Foo ãƒ…

#file -i file1.txt: text/plain; charset=utf-8
#Source: mysql database query (result is the import of file1.txt)

If i insert file1.txt to my database, it shows "Foo ãƒ…". I've tried all kinds of conversions, collations, etc. It never shows the correct characters in mysql - but i'm fine with that.

The problem: I need to check if these strings are the same with an if statement:

var1=$(cat file1.txt )
var2=$(cat file2.txt )

if [ "$var1" != "$var2" ]; then
    #stuff is done
fi

I can't even remember all the things i've tried with iconv to convert either var1 or var2 to match one another so my if statement can work as intended. The only workaround i have is to import file1.txt to another table in my DB and extracting it again, but i'm working with a limited amount of DB connections.

Any tips on how to easier solve this, is greatly appreciated!

`Source: mysql database query` Please show mysql database table definition.https://stackoverflow.com/questions/202205/how-to-make-mysql-handle-utf-8-properly `I need to check if these strings` Create a temporary mysql table with same settings, put the string in it, get it back, and compare then. `but i'm working with a limited amount of DB connections` Start local mysqld with seaprate database. — KamilCuk, Aug 03 '21 at 13:39

score 0 · Answer 1 · answered Aug 03 '21 at 22:33

Thanks KamilCuk! The problem was the collation on the database itself (i didn't even know that was a thing).

Setting the database collation AND the table collation to utf8mb4_unicode_ci fixed the encoding on the import, and therefore the whole problem is solved.

Special characters problem in variable/file. Same string, different format from different sources

1 Answers1