1

Related to Converting Height Measurements from Imperial to Metric units but with a different approach.

I've got a data set where there are a number of values with mixed units, particularly measuring height in both Imperial and Metric units. Additionally, for whatever reason, said Imperial measurements contain an excess of quotation marks.

I had thought that, instead of performing a conversion, a mass find-replace might work, as the heights of people are relatively constrained. For example, someone who is 5' 3" is 160.02cm so it's possible to just find all of one and replace it with another.

Consider this row in the data set test.csv

"Female","Hispanic",25,"5' 11"""

I'd like to replace "5' 11""" with "180.34".

I've tried sed -i 's#"5' 1"""#"180.34"#g' test.csv but the only resulting output is a > character in my terminal window, which suggests I've done something wrong, but can't quite figure out what. Is there something I'm missing to deal with the excess quotation marks?

Fomite
  • 2,213
  • 7
  • 30
  • 46
  • Possible duplicate of [How do I escape double and single quotes in sed?](https://stackoverflow.com/questions/7517632/how-do-i-escape-double-and-single-quotes-in-sed) – Wiktor Stribiżew Oct 16 '18 at 09:24

4 Answers4

2

The problem is the single quote in your sed expression 5' 1.

Use \x27 to replace the single quote:

sed -i 's#"5\x27 1"""#"180.34"#g' test.csv
oliv
  • 12,690
  • 25
  • 45
2

wrt Additionally, for whatever reason, said Imperial measurements contain an excess of quotation marks. - when you write measurements in feet and inches the ' represents feet and the " represents inches. So 5 feet 11 inches is written 5' 11". Within a CSV that has quoted fields like "foo" you need some way to include a " and one way of doing that in some CSV formats (e.g. as exported from Excel) is by doubling up the " to escape it. So to include foo"bar in a quoted field would be "foo""bar". Now lets come back to 5' 11" - the same logic applies and to include that in a quoted field you'd write it as "5' 11""" where the "" before the last " is the escaped representation of a " for inclusion in the quoted field. See What's the most robust way to efficiently parse CSV using awk? for references to applicable CSV "standards" and more info on parsing CSVs with a standard UNIX tool, awk.

wrt your specific question - rather than converting one number at a time using hard-coded values, just convert them all at once algorithmically. Using GNU awk for FPAT:

$ cat tst.awk
BEGIN {
    FPAT = "([^,]*)|(\"[^\"]+\")"
    OFS  = ","
}
{
   split($4,feetinches,/[^0-9]+/)
   ft = feetinches[2] + (feetinches[3] / 12)
   $4 = "\"" int( (10000 * ft / 3.28084) + 0.5 ) / 100 "\""
   print
}

$ cat file
"Female","Hispanic",25,"5' 11"""
"Male","Scottish",54,"6' 1"""
"Female","English",12,"4' 7"""
"TBD","Martian",935,"8' 5"""

$ awk -f tst.awk file
"Female","Hispanic",25,"180.34"
"Male","Scottish",54,"185.42"
"Female","English",12,"139.70"
"TBD","Martian",935,"256.54"
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
1

The easiest is to <double-quote> your sed-string and escape your <double-quotes> in this case:

sed -i "s|\"5' 1\"\"\"|\"180.34\"|g" test.csv
kvantour
  • 25,269
  • 4
  • 47
  • 72
1

Another way :

sed -i 's/5'"'"' 11""/180.34/g' test.csv
ctac_
  • 2,413
  • 2
  • 7
  • 17