I want to read a file lines by line in Unix shell scripting. A line in the file can contain pretty much any kind and any number of characters. So far i've tried a simple read script
while read line
do
echo $line
done < datafile
But if i had a trailing whitespaces this script was outputing part of the lines concatenated to eachother or even duplicated. So i've modified it to:
while IFS= read -r line; do
echo $line
done < datafile
Which fixed the problem and after that, it was working fine. But when i encountered lines that contain special characters - french or german special characters, chinese, cyrilic etc - the script ended up again concatenating and/or duplicating them.
For example:
A file containing names of 4 PDFs(could be anything else), as it is visible in the console with cat command:
????????? ???.pdf AR_CLAIMS_BUBBLES.pdf leur_compte__-_re??u_le.pdf blomberg_RG62540.pdf
The output of the script for that file is:
????????? ???.pdf AR_CLAIMS_BUBBLES.pdf blomberg_RG62540.pdf AR_CLAIMS_BUBBLES.pdf leur_compte__-_re??u_le.pdf blomberg_RG62540.pdf blomberg_RG62540.pdf leur_compte__-_re??u_le.pdf
I don't understand how or why this happen, but it seems to be highly dependant on these special characters. The script only malfunctions when handling lines with such characters(visible as '?' in console).
In that case, how can i accurately read the individual lines?
Note: unfortunately giving the actual content of the files is not possible as i have only access via console to the Unix system.