0

I have a list of dictionaries, made by two files named index with extension {aff,dic} like

dictionaries/dictionaries/bg_BG/index.dic
dictionaries/dictionaries/ca_ES/index.dic
dictionaries/dictionaries/cs_CZ/index.dic
dictionaries/dictionaries/da_DK/index.dic
...
dictionaries/dictionaries/bg_BG/index.aff
dictionaries/dictionaries/ca_ES/index.aff
dictionaries/dictionaries/cs_CZ/index.aff
dictionaries/dictionaries/da_DK/index.aff

and I want to copy them in a different folder, but naming each of the by the subpath like it_IT in order to have

myDicts/it_IT.dic
myDicts/it_IT.acc

I came out with this inline

for file in dictionaries/dictionaries/**/*.{dic,aff}; do echo ${file}; done

that lists the files in these folders, having in $file the for...loop variable dictionaries/dictionaries/da_DK/index.aff.

So using sed I was able to selected (in exclusion) those patterns like

sed 's:[a-z][a-z][_-][A-Z][A-Z]::';

so having

for file in dictionaries/dictionaries/**/*.{dic,aff}; do echo ${file} | sed 's:[a-z][a-z][_-][A-Z][A-Z]::'; done

that this time will print out

dictionaries/dictionaries//index.dic
dictionaries/dictionaries//index.dic
dictionaries/dictionaries//index.dic
...
dictionaries/dictionaries//index.aff
dictionaries/dictionaries//index.aff
dictionaries/dictionaries//index.aff

For my understanding I know that sed to print out the capture group needs to specify the captured group and the non capturing part - see here

But I was not able to figure out how to achieve this in order to have in $file at the end

bg_BG.acc
ca_ES.acc
da_DK.acc
...
bg_BG.dic
ca_ES.dic
da_DK.dic

where the extension {acc,dic} should be added as well. I need to execute this command inline for scripting reasons.

[UPDATE] Thanks to the answer below I came out with this solution

for file in dictionaries/dictionaries/**/*.{dic,aff}; do echo $file | sed 's:.*\([a-z][a-z][_-][A-Z][A-Z]\)/index\(.*\):cp & myDicts/\1\2:' | sh; done

that does its job:

$ ls myDicts/
bg_BG.aff cs_CZ.aff de_AT.aff de_DE.aff en_AU.aff en_GB.aff en_ZA.aff eu_ES.aff gl_ES.aff it_IT.aff mn_MN.aff nl_NL.aff pl_PL.aff pt_PT.aff ru_RU.aff sl_SI.aff sv_SE.aff uk_UA.aff
bg_BG.dic cs_CZ.dic de_AT.dic de_DE.dic en_AU.dic en_GB.dic en_ZA.dic eu_ES.dic gl_ES.dic it_IT.dic mn_MN.dic nl_NL.dic pl_PL.dic pt_PT.dic ru_RU.dic sl_SI.dic sv_SE.dic uk_UA.dic
ca_ES.aff da_DK.aff de_CH.aff el_GR.aff en_CA.aff en_US.aff es_ES.aff fr_FR.aff hr_HR.aff lb_LU.aff nb_NO.aff nn_NO.aff pt_BR.aff ro_RO.aff sk_SK.aff sr_RS.aff tr-TR.aff vi_VN.aff
ca_ES.dic da_DK.dic de_CH.dic el_GR.dic en_CA.dic en_US.dic es_ES.dic fr_FR.dic hr_HR.dic lb_LU.dic nb_NO.dic nn_NO.dic pt_BR.dic ro_RO.dic sk_SK.dic sr_RS.dic tr-TR.dic vi_VN.dic

There is only one pitfall that is it does not capture these path patterns

dictionaries/dictionaries/ca_ES-valencia/
dictionaries/dictionaries/sr_RS-Latn
dictionaries/dictionaries/ca_ES-valencia/
dictionaries/dictionaries/sr_RS-Latn/
Community
  • 1
  • 1
loretoparisi
  • 15,724
  • 11
  • 102
  • 146

1 Answers1

1

here's a way:

echo dictionaries/dictionaries/da_DK/index.aff |
  sed 's:.*\([^/]\+\)/index\(\..*\):\1\2:'

output:

da_DK.aff

however, there's a faster way than a for loop:

find dictionaries/dictionaries -name "index.dic" -or -name "index.aff" |
  sed 's:dictionaries/dictionaries/\([^/]\+\)/index\(\..*\):mv & myDicts/\1\2:'

if that produces the commands you want, pipe it to sh:

mkdir myDicts
find dictionaries/dictionaries -name "index.dic" -or -name "index.aff" |
  sed 's:dictionaries/dictionaries/\([^/]\+\)/index\(\..*\):mv & myDicts/\1\2:' |
  sh
webb
  • 4,180
  • 1
  • 17
  • 26
  • Thanks it works! I have missed some patterns like `dictionaries/dictionaries/ca_ES-valencia/index.aff`, `dictionaries/dictionaries/sr_RS-Latn/index.dic`, `dictionaries/dictionaries/sr_RS-Latn/index.aff`, etc. How to add this group as well? – loretoparisi Nov 30 '16 at 09:51
  • btw the first command it works, for some reason I get a `sh: line 72: dictionaries/dictionaries/tr-TR/index.dic: Permission denied` in the `find` pipe to `sh`. – loretoparisi Nov 30 '16 at 10:03
  • Tried this way: `for file in dictionaries/dictionaries/**/*.{dic,aff}; do $file | sed 's:.*\([a-z][a-z][_-][A-Z][A-Z]\)/index\(.*\):cp & myDicts/\1\2:'; done` but now `permission denied` as well. – loretoparisi Nov 30 '16 at 16:27
  • 1
    sorry, there was a typo: 4 `....` instead of 5 `.....`. i've fixed it to handle e.g. `ca_ES-valencia` as well. – webb Dec 01 '16 at 23:15