0

I'm reading an excel(.xlsx) file (using module Spreadsheet::XLSX) and getting values like : Iron/ Steel. 

Problem: The characters  and are not (visible) in the excel file. The rightmost character looks like a white space but isn't as I tried the regex /\s+$/ which didn't work.

Please help how I can clean this string. I want only those characters in my string which are on general English keyboards, i.e., A-Z, 0-9, ~!@#$%^&*()_+=- ` ,./';[]\|}{:"?>< etc.

Tim B
  • 40,716
  • 16
  • 83
  • 128
GrSrv
  • 551
  • 1
  • 4
  • 22
  • 3
    Isn't it a problem of wrong [encoding](http://p3rl.org/Encode)? – choroba Sep 03 '14 at 08:43
  • That's what I thought. I tried the solution given here : http://stackoverflow.com/a/14509489. didn't work for me. – GrSrv Sep 03 '14 at 10:12
  • What does the cell actually contain in Excel? – choroba Sep 03 '14 at 10:15
  • I'm not sure what you're really asking. I don't know who made these excels file or how s/he made these. Whether all data was typed in or copy/pasted. It only has some name like `Iron/ Steel`, `computers` and some numbers. I was facing this issue only for 3-4 cells. – GrSrv Sep 03 '14 at 10:31

2 Answers2

4

You can remove all non ascii chars,

$string =~ s/[^[:ascii:]]//g;
mpapec
  • 50,217
  • 8
  • 67
  • 127
1

When it's always the same position, I think a substr($string, 0, -3) can help!

  • Neat. But does not actually solve my problem. The characters that I've mentioned in the question are not in all of the strings, only some of them. – GrSrv Sep 03 '14 at 10:11