0

I'm using Tail to an error happen on the log lines like:

tail -f syschecklog.log | grep "ERROR processEvent: /mnt/docs/"

and this gives results like:

01.lnxp.com 2019-03-13 07:10:24, 345 ERROR processEvent: /mnt/docs/003217899/cfo paid ¿ inv -inc 1234321

So what I do manually is to change the path using cd:

cd /mnt/docs/003217899/

Is there any script to change directory automatically? As I run another manual script to change file names for the files contained in /003217899/, those like /003217899/ are happening many times a day, and they are changing, so I need this script to automatically catch those errors, and change the path then run a file name change script.

In addition to the above, the log line has another subfolder that contains a error file name like /mnt/docs/003217899/attch/fees ¿ to be paid. How can we cd to that directory?

After Altering [Update]

 grep "ERROR processEvent: /mnt/docs/"  syschecklog.log | sed 's#.*ERROR processEvent: /mnt/docs/ \(/.*\)/.*#\1#' | while read -r DIR
   do
BASEDIR=${DIR%/*}
if [ "$BASEDIR" != /mnt/docs/ ]
then
    ( cd "$BASEDIR" && find  -type f  -exec touch {} + | python -c 'import os, re; [os.rename(i, re.sub(r"\?", "¿", i)) for i in os.listdir(".")]' )
fi
# end of code for additional requirement
( cd "$DIR" && find  -type f  -exec touch {} + | python -c 'import os, re; 
    [os.rename(i, re.sub(r"\?", "¿", i)) for i in os.listdir(".")]' )
    done

Results:

[results][1]

3rd script updated for renameFiles();

$ renameFiles()
> {
>     # The next line is copied unchanged from the question. This could be improved.
>     find  -type f  -exec touch {} + | python -c 'import os, re; [os.rename(i, re.sub(r"\?", "¿", i)) for i in os.listdir(".")]'
> }
$
$ # Two possible variants because the question was modified.
$ #
$ # To process the complete input file as it is now
$ #  grep "ERROR processEvent: /mnt/docs/"  syschecklog.log | ...
$ #
$ # To continuously follow the file
$ # tail -f /mnt/docs/syschecklog.log | grep "ERROR processEvent: /mnt/docs/" | ...
$
$ grep "ERROR processEvent: /mnt/docs/"  syschecklog.log | sed 's#.*ERROR processEvent: \(/.*\)/.*#\1#' | while read -r DIR
> do
>     # additional requirement from comment: if DIR is /mnt/docs/003217899/attch
>     # the script should be run both in .../003217899 and .../attch
>     BASEDIR=${DIR%/*}
>     if [ "$BASEDIR" != /mnt/docs/ ]
>     then
>         ( cd "$BASEDIR" && renameFiles)
>     fi
>     # end of code for additional requirement
>     ( cd "$DIR" && renameFiles)
> done
-bash: cd: /mnt/docs/001234579/Exp8888861¿_Applicant_Case_Conference_l (No such file or directory): No such file or directory
-bash: cd: /mnt/docs/001888579/¿_SENIOR_RESOLUTION_MANAGER_i(No such file or directory): No such file or directory
-bash: cd: /mnt/docs/001234579/Exp2222276¿18 from all and Treatments Inc. February 27_ 20199999(No such file or directory): No such file or directory

3rd results [3rd results][2]

     -bash: cd: /mnt/docs/001234579/Exp8888861¿_Applicant_Case_Conference_l (No such file or directory): No such file or directory
-bash: cd: /mnt/docs/001888579/¿_SENIOR_RESOLUTION_MANAGER_i(No such file or directory): No such file or directory
-bash: cd: /mnt/docs/001234579/Exp2222276¿18 from all and Treatments Inc. February 27_ 20199999(No such file or directory): No such file or directory
  • grep results as you requested;

    grep "ERROR processEvent: /mnt/docs/"  syschecklog.log
        01.lnxp.com 3    2019-03-14 07:04:30,446 ERROR processEvent: /mnt/docs/001111224/Exposure2178861/Email_from_LAT__18_009945_AABS¿__Summary_not_received12128050 (No such file or directory)
    01.lnxp.com 3    2019-03-14 07:05:13,137 ERROR processEvent: /mnt/docs/001567890/Coop_subro_question__TO__ZED_LANDERS_¿_SENIOR__Basse12130781 (No such file or directory)
    01.lnxp.com 3    2019-03-14 07:05:19,914 ERROR processEvent: /mnt/docs/001323289/Exposure2622276/OCF¿18 from All and                              Treatments Inc. February 27_ 201912129762 (No such file or directory)
    

Results of Locale

$ locale
LANG=en_CA.UTF-8
LC_CTYPE="en_CA.UTF-8"
LC_NUMERIC="en_CA.UTF-8"
LC_TIME="en_CA.UTF-8"
LC_COLLATE="en_CA.UTF-8"
LC_MONETARY="en_CA.UTF-8"
LC_MESSAGES="en_CA.UTF-8"
LC_PAPER="en_CA.UTF-8"
LC_NAME="en_CA.UTF-8"
LC_ADDRESS="en_CA.UTF-8"
LC_TELEPHONE="en_CA.UTF-8"
LC_MEASUREMENT="en_CA.UTF-8"
LC_IDENTIFICATION="en_CA.UTF-8"
LC_ALL=

Results of fgrep python yourscript | od -c -tx1

$ fgrep python invert.sh | od -c -tx1
0000000                   f   i   n   d           -   t   y   p   e
         20  20  20  20  66  69  6e  64  20  20  2d  74  79  70  65  20
0000020   f           -   e   x   e   c       t   o   u   c   h       {
         66  20  20  2d  65  78  65  63  20  74  6f  75  63  68  20  7b
0000040   }       +       |       p   y   t   h   o   n       -   c
         7d  20  2b  20  7c  20  70  79  74  68  6f  6e  20  2d  63  20
0000060   '   i   m   p   o   r   t       o   s   ,       r   e   ;
         27  69  6d  70  6f  72  74  20  6f  73  2c  20  72  65  3b  20
0000100   [   o   s   .   r   e   n   a   m   e   (   i   ,       r   e
         5b  6f  73  2e  72  65  6e  61  6d  65  28  69  2c  20  72  65
0000120   .   s   u   b   (   r   "   \   ?   "   ,       " 302 277   "
         2e  73  75  62  28  72  22  5c  3f  22  2c  20  22  c2  bf  22
0000140   ,       i   )   )       f   o   r       i       i   n       o
         2c  20  69  29  29  20  66  6f  72  20  69  20  69  6e  20  6f
0000160   s   .   l   i   s   t   d   i   r   (   "   .   "   )   ]   '
         73  2e  6c  69  73  74  64  69  72  28  22  2e  22  29  5d  27
0000200  \n
         0a
0000201

I need to change each '?' in the filename to '¿' as the system creates '?' and it shows as '¿', so have to change to that where the server can understand it!

I found that Capital A with hat is created by itself in the system, using CAT

cat invert.sh
#!/bin/bash

renameFiles()
{
    find  -type f  -exec touch {} + | python -c 'import os, re; [os.rename(i, re.sub(r"\?", "¿", i)) for i in os.listdir(".")]'
}
 grep "ERROR processEvent: /mnt/docs/"  syschecklog.log | sed 's#.*ERROR processEvent: /mnt/docs/ \(/.*\)/.*#\1#' | while read -r DIR
do

    BASEDIR=${DIR%/*}
    if [ "$BASEDIR" != /mnt/cc-docs ]
    then
        ( cd "$BASEDIR" && renameFiles)
    fi

    ( cd "$DIR" && renameFiles)

results of od -c -txl, on the error file;

echo *|od -c -tx1
0000000   O   C   F   -   2   1       I   n   v       2   0   8   3   5
         4f  43  46  2d  32  31  20  49  6e  76  20  32  30  38  33  35
0000020   9   9       A   s   s   e   s   s   M   e   d       $   6   2
         39  39  20  41  73  73  65  73  73  4d  65  64  20  24  36  32
0000040   1   .   5   0       (   H   a   n   g       Q   )       ?
         31  2e  35  30  20  28  48  61  6e  67  20  51  29  20  3f  20
0000060   d   t   d       F   e   b       2   7   _       2   0   1   9
         64  74  64  20  46  65  62  20  32  37  5f  20  32  30  31  39
0000100   1   2   1   7   4   5   8   3  \n
         31  32  31  37  34  35  38  33  0a
0000111

Checked the systems when using eco on hex encoding on ¿, its attaching  to it as below;

$echo -e '\xc2\xbf'
¿
HKLM
  • 41
  • 7
  • @mohit6up I see you defined a way in https://stackoverflow.com/questions/10358547/how-to-grep-for-contents-after-pattern, but I want is run the script on the path found? I tried to PM you, but I couldn't is there away to contact you? so would you able to advise on this! – HKLM Mar 13 '19 at 15:12
  • As there seems to be a space in your directory name, How do you find the end of the directory? Everything from the first `/` to the last `/` after `ERROR processEvent:`? – Bodo Mar 13 '19 at 15:22
  • @Bodo Sorry, I removed the space, but yes there is space between "Error processEvent:" AND the path /mnt/...., as its log line, I need a script change directory for the result of that as I mentioned, so for my example I want to change the directory to /mnt/docs/003217899/ – HKLM Mar 13 '19 at 15:32
  • Please show the data **as text** formatted as a code block instead of a screen shot. This allows to use copy/paste for testing proposed scripts. – Bodo Mar 14 '19 at 13:21
  • When you run your commands in `/mnt/ProdCluster/cc-docs/123456789` the `find` command will also find files in subdirectories like `/mnt/ProdCluster/cc-docs/123456789/something`. Your renaming command could be modified so that a second run in the subdirectory is not necessary. The `find` with `-exec touch ...` modifies the modification time of all files recursively in all subdirectories. For me it is not fuly clear how you want to rename the files. Do you want to replace the combination of `\?` with the special character `¿`? Or vice versa? Is your encoding UTF-8? – Bodo Mar 14 '19 at 15:24
  • **Don't use screenshots!** Please provide the input file or the output of `grep "ERROR processDocumentAddedEvent - storeDocument : /mnt/ProdCluster/cc-docs/" /var/log/claimcenter/cclog.log` **as text** formatted in a code block (using the `{}` tool of the editor field). Without the data as text I cannot reproduce the errors you get. – Bodo Mar 14 '19 at 15:46
  • @Bodo that my bad, I'm just new for this, for everything to be honest. I now updated the question and put them in texts. hopefully they works for you. Thanks! – HKLM Mar 14 '19 at 17:49
  • results for the grep you requested also added. – HKLM Mar 14 '19 at 19:10
  • @Bodo I think I just saw those requests for input texts later. They eere located in between lines. And comments are not viable until you expand all of them. Your request done that yesterday.. and you downvoted me without my real intention to do that... – HKLM Mar 15 '19 at 10:58
  • There is a strange gap/space in the `grep` output, but I was able to use it for testing anyway because the space is in the file name that is cut off by the `sed` command. I will add my result to the answer. – Bodo Mar 15 '19 at 11:01
  • @Bodo I will try the new script soon, but I think this ¿ is our problem, and that why those files in error are creating , as users submits files through windows and might be their system's encoding is making this inverted symbol, when it comes to the linux server. I will double check soon and let you know about the script run. Thanks! – HKLM Mar 15 '19 at 13:51
  • I don't know if the `¿` remains the same when you paste the data into the question. If my script doesn't work, please, add the output of `grep -m 1 "ERROR processDocumentAddedEvent - storeDocument : /mnt/ProdCluster/cc-docs/" /var/log/claimcenter/cclog.log | od -c -tx1`. This shows the encoding for the first matching line (`-m 1`). Then I can compare this with my test data. Or if possible run `grep "ERROR processDocumentAddedEvent - storeDocument : /mnt/ProdCluster/cc-docs/" /var/log/claimcenter/cclog.log > filtered.log`, upload the resulting `filtered.log` somewhere and provide a link to it. – Bodo Mar 15 '19 at 14:11
  • You should add the output of `grep -m 1 "ERROR processDocumentAddedEvent - storeDocument : /mnt/ProdCluster/cc-docs/" /var/log/claimcenter/cclog.log | od -c -tx1` to the question as a code block. I found that `sed` has a problem with non-UTF-8 characters and added a fix to the script. See updated answer. – Bodo Mar 17 '19 at 19:25
  • @Bodo I still have problem as it gives me ¿, after running the script, so I have to run the same script again with changing this piece `re.sub(r"\¿", "?", i)`, and then again re-run the normal 1st script. Any Idea? – HKLM Mar 20 '19 at 12:18
  • This looks like an encoding problem. Maybe there is a difference between the script that renames the file and the directory listing. What editor do you use to create the script? Does it have a setting for the encoding? Which encoding does the server expect for the file name? UTF-8? Please `cd` into a directory that contains a file with a wrong character and show the output of `echo *|od -c -tx1` (assuming there are not many files in there) – Bodo Mar 20 '19 at 14:30
  • @Bodo I did as requested `echo *|od -c -tx1 0000000 i n v e r t . s h \n 69 6e 76 65 72 74 2e 73 68 0a 0000012 ` I'm using MTPuTTY (Multi-Tabbed PuTTY) – HKLM Mar 20 '19 at 14:39
  • @Bodo and another thing I checked the settings for Putty, Checking Translation, its on ISO-8859-1:1998 (Latin-1, West Europe), and when I changed to UTF-8, and checking that script with CAt, it doesnt show that as ¿ in the rename function, its only in ¿, any advise ? – HKLM Mar 20 '19 at 15:01
  • You didn't answer all my questions. Please add all information **to your question**. The log file seems to contain characters with single byte encoding and a value `0xBF` but I'm not sure it is the same with the actual file name. I don't meant to run `echo *|od`... in the directory where your script is but where a file is with a `?` or `¿` in its name, e.g `/mnt/docs/001137775/Expo2178861` **before running the script**. It is normal that you see more than one character when you display a multi-byte UTF-8 character with an encoding like ISO-8859-1. `ls` may show `?` for any "unknown" character. – Bodo Mar 20 '19 at 17:52
  • @Bodo, I added the results of echo | od on the question as requested , I have checked this web site https://www.charset.org/utf8-to-latin-converter and added my rename function: `find -type f -exec touch {} + | python -c 'import os, re; [os.rename(i, re.sub(r"\?", "¿", i)) for i in os.listdir(".")]' ` to convert then the output was ` find -type f -exec touch {} + | python -c 'import os, re; [os.rename(i, re.sub(r"\?", "¿", i)) for i in os.listdir(".")]' `, so its adding this `Â` as a general isue, is there away to use hex mode instead real string `¿`. – HKLM Mar 20 '19 at 18:18
  • I can propose a simpler shell script without the Python stuff. Are there any other files in `/mnt/docs/001137775/Expo2178861` and `/mnt/docs/001137775`? Can you simply rename all files in `/mnt/docs/001137775` and all subdirectories (which includes `Expo2178861`)? – Bodo Mar 20 '19 at 18:33
  • @Bodo what do you mean to rename, no I cant rename simply, as those the system not reading them, we doing doing this manually by changing each `?` to `¿`, using other rename optioins like `mv`, or do that manually, as those already attached to the system, and can't be rename manually. – HKLM Mar 20 '19 at 18:39
  • so the only option is to replace only ? to ¿, and inverted one the system attaches this  to it. – HKLM Mar 20 '19 at 18:41
  • With "Can you simply rename..." I meant "Can you use a script that simply renames all files in this directory and all subdirectory or do you have to rename specifically in the two directories `/mnt/docs/001137775/Expo2178861` and `/mnt/docs/001137775`?" I'm not sure, but I think your Python script renames all files in the current directory only, supposed to change all `?` to `¿`, while the `find ... touch` changes the modification time for all files in the current directory and all subdirectories. When there are no other files which must not be renamed, I can simplify the script. – Bodo Mar 20 '19 at 18:51
  • Or I could create a script that renames all files that contain a `?` in its name in `/mnt/docs/001137775` including all subdirectories. – Bodo Mar 20 '19 at 18:58
  • @Bodo I didnt try to check if the rename, works on the subfolders listed. but I will try when they appear, as I'm waiting for new events to check and check the script, but did you notice what I found checking the website charset.org/utf8-to-latin-converter , If you test the rename or anyscript includes ¿, will add Â. so my problem is the same, and I think its general, but there should be away fixes the problem, because as I mentioned the script described in solution works when running it in command line directly, but when I save it as bash to be run in crontab it give that  added in there. – HKLM Mar 21 '19 at 14:05
  • I can propose a modified script that hopefully will not have the problem, but it is difficult to get clear information about your requirements. – Bodo Mar 21 '19 at 14:31
  • @Bodo so what is your proposed script then? let have a try! – HKLM Mar 21 '19 at 15:47
  • Apparently the `sed` command has not correctly processed the error message line. The script works when I run it on my computer with the provided error message line from the log file. I will stop doing your work here. Run the commands one by one or start with the first and add more step by step to find out what's going wrong. Read the documentation of the commands. – Bodo Mar 22 '19 at 13:25

1 Answers1

0

Script modified again for additional requirements.

(As I did not get answers to all questions I modified the script based on the incomplete information.)

Instead of processing two directories separately the script now uses find in the parent directory (or the only directory), renames and touches all files that contain a '?' in the name. (-name '*\?*').

#! /bin/bash

# Two possible variants because the question was modified.
#
# To process the complete input file as it is now
#  fgrep "ERROR processEvent: /mnt/docs/"  syschecklog.log | ...
#
# To continuously follow the file
# tail -f syschecklog.log| fgrep "ERROR processEvent: /mnt/docs/" | ... 
# The "LANG=C sed ..." avoids problems with invalid UTF-8 characters that do not match '.' in sed's pattern

fgrep "ERROR processEvent: /mnt/docs/"  syschecklog.log | LANG=C sed 's#.*ERROR processEvent: \(/mnt/docs/[^/]*\)/.*#\1#' | while IFS= read -r DIR
do
    find "$DIR" -name '*\?*' | while IFS= read -r FILE
    do
        NEW=$(echo "$FILE"| tr '?' $'\xBF')
        mv "$FILE" "$NEW"
        touch "$NEW"
    done
done

Note that grep and sed will switch to buffered output when used in a pipeline. This will delay the processing of the extracted lines. You might have to disable buffering for the commands in the pipeline, see http://mywiki.wooledge.org/BashFAQ/009

2nd major update

There was a problem with invalid characters. In a UTF-8 environment sed behaves strangely when the input contains bytes that are not valid UTF-8 charactes. The pattern . does not match these invalid characters. (The example file contains a byte with the value 0xBF. See http://www.linuxproblem.org/art_21.html. Setting LANG=C for the sed command fixes this problem.

I tested my script with the grep output added to the question. I wrote this into a file somelog.log. I modified my script to use grep pattern somelog.log | ... with a local file instead of using a log file with a full path which does not exist on my test system.

After adding LANG=C to the sed command the script ran successfully with the raw input file provided as an external link.

The output is

$ grep "ERROR processEvent: /mnt/docs/"  syschecklog.log | sed 's#.*ERROR processEvent: \(/.*\)/.*#\1#' | while read -r DIR; do     BASEDIR=${DIR%/*};     if [ "$BASEDIR" != /mnt/docs/ ];     then         ( cd "$BASEDIR" && renameFiles);     fi;     ( cd "$DIR" && renameFiles); done
bash: cd: /mnt/docs/001234567: No such file or directory
bash: cd: /mnt/docs/001234567/Subdir9876543: No such file or directory
bash: cd: /mnt/docs/002345678: No such file or directory
bash: cd: /mnt/docs/003456789: No such file or directory
bash: cd: /mnt/docs/003456789/Subdir8765432: No such file or directory
... (more similar lines removed)

You can see that it tried to cd into the directories from the log messages. It does not show parts of the file name. In my case it simply failed because the directories don't exist. I think the script should work.

After replacing the two cd and renameFiles commands with find... the output with my test is

find: ‘/mnt/docs/001234567’: No such file or directory
find: ‘/mnt/docs/002345678’: No such file or directory
find: ‘/mnt/docs/003456789’: No such file or directory
...
Bodo
  • 9,287
  • 1
  • 13
  • 29
  • Might want to amend this to deal with buffering issues; unless there's a lot of content written to the file, could be a significant delay between a matching line being written to the file and the `read` picking it up. (For those curious what I'm talking about, see [BashFAQ #9](http://mywiki.wooledge.org/BashFAQ/009)). – Charles Duffy Mar 13 '19 at 15:42
  • @Bodo I will try that but sometimes the log line has another subfolder contains that error file name like "/mnt/docs/003217899/attch/fees ¿ to be paid" , So how cd on that sub folder, and that depends on the file that errors out, so in this case I need to cases to check file under /003217899/ or /attch/ for both situations, sometimes doesnt have this subfolder! – HKLM Mar 13 '19 at 15:57
  • @Bodo I edited the question, is your answer updated too ? – HKLM Mar 13 '19 at 18:41
  • Hi @Bodo ! can I PM you? – HKLM Mar 13 '19 at 19:40
  • @Bodo yeh, I didnt know its not possible, but the code is not works for me, not sure if something is missing, and for the buffer issue, I cant disable as the permission is limited. – HKLM Mar 14 '19 at 11:54
  • @HKLM What exactly means "the code is not works for me". What happens? In your question you should show some more real lines from your file `syschecklog.log` both with and without `attch` to allow better testing. For the buffering issue you probably only have to add options to the `grep` and `sed` commands, see [BashFAQ#9](http://mywiki.wooledge.org/BashFAQ/009) below headline "Your command may already support unbuffered output" – Bodo Mar 14 '19 at 12:07
  • @Bodo see in the question I added the real command altered **After Altering [Update]**, and the results as screenshot. Thanks – HKLM Mar 14 '19 at 12:15
  • @Bodo did you see my updated question, see after **After Altering [Update]**, I have attached the screenshot of the results, I put the script in an executable shell named as invert.sh , so I think it only missing a part, would you able to relook on that please! Thanks – HKLM Mar 14 '19 at 15:12
  • @Bodo see the results running the last updated one, and by the way syschecklog was only as example, I have updated to the real one. I couldnt paste it here as it was long characters, Will attached to the questions. – HKLM Mar 14 '19 at 15:31
  • @Bodo I have attached the 3rd results, after updating and removing tail -f. I dont know why the last (/) replace with (¿) after running the script! – HKLM Mar 14 '19 at 17:14
  • @Bodo the script did work for me, this 'LANG=C' was fised the encoding UTF issue. I did changed the real inputs and outputs for a testing environment, as I needed to hide those results, that I reported them, where they where real information, and sensitive, so for Privacy issue, I have changed them. Please append yours as well, I will be appreciated. – HKLM Mar 18 '19 at 14:10
  • @Bodo, there is another bug happen when running the script; _[root@01.lnxp.com 002345411]$ ls FW__Salvage?_Not_Released_?_STEPHANIE_ALVIAR___Stock#10918563__C#001561411__12157244_ and after running the script `[root@01.lnxp.com 002345411]$ ls FW__Salvage¿_Not_Released_¿_STEPHANIE_ALVIAR___Stock#10918563__C#001561411__12157244` – HKLM Mar 18 '19 at 20:23
  • @Bodo, I dont know this `Â` is adding to the prefix, and its not happen all the times. – HKLM Mar 18 '19 at 20:27
  • @HKLM This might be a problem of your renaming operation which I put unchanged into function `renameFiles`. I already asked on Mar 14 at 15:24 in my comment to the question how exactly you want to rename the files and about the encoding you are using. Please show in your question the output of the command `locale`, the output of `fgrep python YourScript | od -c -tx1` (assuming your `python` command is in one line). Please state how exactly you want to replace the "strange" characters in the file names. Probably this can be improved in the script. – Bodo Mar 19 '19 at 10:02
  • @Bodo I have added what is requested lastly, I need to change those '?' symbols to '¿', I'm not sure the problem with rename command, do you have any other option to change that? Thanks – HKLM Mar 19 '19 at 12:24