3

I would like to rename a linux file to a filename that is legal in windows. It should not be longer than is allowed and should not have characters that are not allowed in windows. Sometimes I copy the title from papers to a filename and they have special characters such as , ®, or ?

Also there is there are some kind of characters sometimes at the ends of lines generated when copying and pasting a title from a pdf. You can see them when using sed -n 'l':

echo 'Estrogen receptor agonists and estrogen attenuate TNF-α induced
α
apoptosis in VSC4.1 motoneurons.pdf' | sed -n 'l'
Estrogen receptor agonists and estrogen attenuate TNF-\316\261 induce\
d$
\316\261$
apoptosis in VSC4.1 motoneurons.pdf$

or

echo 'A synthetic review of the five molecular Sorlie’s subtypes in
breast cancer' | sed -n 'l' 
A synthetic review of the \357\254\201ve molecular Sorlie\342\200\231\
s subtypes in$
breast cancer$

I have started a script but it is not elegant and incomplete. Has someone done something like this already or is there a fast elegant way to do it?

fn2win="$1"
testFn=$(echo "$fn2win" | sed -n 'l')
#SPEC_CHAR="ÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞàáâãäåçèéêëìíîïðñòóôõöøùúûüýþÿ"
#NORM_CHAR="AAAAAACEEEEIIIIDNOOOOOOUUUUYPaaaaaaceeeeiiiionoooooouuuuyby"
#SPEC_LOW_CHAR="aàáâãäåāăąbḃcćçčĉċdḑďḋđeèéěêëēĕęėfḟƒgǵģǧĝğġǥhĥħiìíîĩïīĭįıjĵkḱķǩlĺļľłmṁnńņňñoòóôõöōŏøpṗqrŕŗřsśşšŝṡſtţťṫŧuùúûũüůūŭųvwẁẃŵẅxyỳýŷÿzźžż"
#NORM_LOW_CHAR="aaaaaaaaaabbccccccdddddeeeeeeeeeefffgggggggghhhiiiiiiiiiijjkkkklllllmmnnnnnoooooooooppqrrrrssssssstttttuuuuuuuuuuvwwwwwxyyyyyzzzz"
#SPEC_CAP_CHAR="AÀÁÂÃÄÅĀĂĄBḂCĆÇČĈĊDḐĎḊĐEÈÉĚÊËĒĔĘĖFḞGǴĢǦĜĞĠǤHĤĦIÌÍÎĨÏĪĬĮİJĴKḰĶǨĸLĹĻĽŁMṀNŃŅŇÑOÒÓÔÕÖŌŎØPṖQRŔŖŘSŚŞŠŜṠTŢŤṪŦUÙÚÛŨÜŮŪŬŲVWẀẂŴẄXYỲÝŶŸZŹŽŻ"
#SPEC_CAP_CHAR="AAAAAAAAAABBCCCCCCDDDDDEEEEEEEEEEFFGGGGGGGGHHHIIIIIIIIIIJJKKKKKLLLLLMMNNNNNOOOOOOOOOPPQRRRRSSSSSSTTTTTUUUUUUUUUUVWWWWWXYYYYYZZZZ"
#sed -e "y/'$SPEC_CHAR'/'$NORM_CHAR'/"
if [ "$fn2win" != "$testFn" ]; then
  newLinFn=$(echo "$fn2win" | fromdos | tr "\n" " " |\
     sed -e "
     s/[?()\[\]=+<>:;©®”,*|]/_/g
     s/"$'\t'"/ /g
     s/–/-/g
     s/’/'/g
     s/α/alpha/g
     s/β/beta/g
     s/µ/micro/g
     s/Æ/AE/g
     s/Ǽ/AE/g
     s/æ/ae/g
     s/ǽ/ae/g
     s/DZ/DZ/g
     s/DŽ/DZ/g
     s/Dž/Dz/g
     s/Dz/Dz/g
     s/dz/dz/g
     s/dž/dz/g
     s/ff/ff/g
     s/fi/fi/g
     s/fl/fl/g
     s/ffi/ffi/g
     s/ffl/ffl/g
     s/ſt/ft/g
     s/IJ/IJ/g
     s/ij/ij/g
     s/LJ/LJ/g
     s/Lj/Lj/g
     s/lj/lj/g
     s/NJ/NJ/g
     s/Nj/Nj/g
     s/nj/nj/g
     s/Œ/OE/g
     s/œ/oe/g
     s/ß/SZ/g
     s/\"/_/g
     s/[[:cntrl:]]/_/g
     s/\ $//g
     " |\
   fold -s -w 251 | head -1 | sed 's/\ $/.pdf/')
  if [ "$fn2win" != "$newLinFn" ]; then
      mv "$fn2win" "$newLinFn"
    fi
fi
winFn=$(echo "z:"$newLinFn | sed 's/\//\\/g' )
D W
  • 2,979
  • 4
  • 34
  • 45
  • I don't think this is off-topic, I'm not sure why there was a close vote for this – D W Dec 10 '10 at 21:26
  • Someone probably felt that stringing together a bunch of sed operations looks more like a usage question than a programming one. It's a debatable position. – Chris Stratton Dec 10 '10 at 21:43
  • Thank you for the explanation. I think this is a useful function to be able to do. I need to use PDF-XChange Viewer for it's highlighting capabilities through wine and this would be useful for that. I look at, and highlight hundreds of papers as I'm sure other researchers do, so someone must have ran into this problem. Where is an appropriate place to ask this question? – D W Dec 10 '10 at 21:47
  • http://www.ibm.com/developerworks/linux/library/l-sed3.html seems useful to organize the sed commands – D W Dec 10 '10 at 22:21
  • 1
    Related to http://stackoverflow.com/questions/620605/how-to-make-a-valid-windows-filename-from-an-arbitrary-string ? – blueberryfields Dec 11 '10 at 00:04
  • @blueberryfields: That solution is specific to .Net, and just replaces the characters with dashes. – OliJG Dec 12 '10 at 02:09

1 Answers1

1

This looks like it should do it: http://pwet.fr/man/linux/commandes/konwert

OliJG
  • 2,650
  • 1
  • 16
  • 15
  • I'm not sure this would work for what I am trying to do, it seems to convert encodings but I don't see an obvious conversion for my purpose. – D W Dec 13 '10 at 23:35
  • Converting to ascii would take you most of the way there, limiting the number of hardcoded conversions you have left to handle. – OliJG Dec 14 '10 at 10:43
  • +1 `konwert utf8-ascii` is helpful and at least gets rid of international characters and converts ligatures into their separate characters. `konwert utf8-tex` is also interesting because it converts greek symbols such as α into \alpha. – D W Dec 15 '10 at 17:30