1

I am reading some text line by line in my mapper code (R code) from HDFS. The text looks like:

15059773^A3872^A\N^A2015-09-05^A\N^A2015-09-01^A3^A0^A0^A\N^Ashirts adult male^Axl^A\N^A5183656^Ac1 13 me ult tee c^Ablue^Awatersport blue^A\N^A\N^A\N^A0^A\N^A3^Amn^A45.05273^A-93.365555^A100^A131^A27.0^A13.0^A8.0^A85.0^A57.0^A21.0^A1012.0^A0^A0^A1^A0^A43^A3^A4
15432724^A7720^A\N^A2015-09-05^A\N^A2015-09-01^A3^A0^A0^A\N^Ashirts adult male^Al^A\N^A5183656^Ac1 13 me ult tee c^Ablue^Ablue foil^A\N^A\N^A\N^A0^A\N^A3^Amn^A45.05273^A-93.365555^A100^A131^A27.0^A13.0^A8.0^A85.0^A57.0^A21.0^A1012.0^A0^A0^A1^A0^A43^A3^A4

and the code used to read that in loop is:

input <- file("stdin", "r")

while(length(line <- readLines(input, n=1, warn=FALSE)) > 0)
{
}
close(input)

In the text above I have ^A as my field separator and \N is present where there is some blank (R's NA). I was able to separate ^A using \001 (not sure how it works?). But I am facing problem in replacing \N. I have tried suggestions in: remove all line breaks (enter symbols) from the string using R and a few more; but nothing works. I have also tried with \\N but that also don't work.

As I am processing this line by line so my expected output for first line is:

"15059773" "3872" NA "2015-09-05" NA "2015-09-01" "3" "0" "0" NA "shirts adult male" "xl" NA "5183656" "c1 13 me ult tee c" "blue" "watersport blue" NA NA NA "0" NA "3" "mn" "45.05273" "-93.365555" "100" "131" "27.0" "13.0" "8.0" "85.0" "57.0" "21.0" "1012.0" "0" "0" "1" "0" "43" "3" "4"
Community
  • 1
  • 1
abhiieor
  • 3,132
  • 4
  • 30
  • 47

1 Answers1

1

This seems to be working:

ifelse(strsplit(string, "\\^A")[[1]] == "\\N", NA, strsplit(string, "\\^A")[[1]])
 [1] "15059773"           "3872"               NA                   "2015-09-05"         NA                  
 [6] "2015-09-01"         "3"                  "0"                  "0"                  NA                  
[11] "shirts adult male"  "xl"                 NA                   "5183656"            "c1 13 me ult tee c"
[16] "blue"               "watersport blue"    NA                   NA                   NA                  
[21] "0"                  NA                   "3"                  "mn"                 "45.05273"          
[26] "-93.365555"         "100"                "131"                "27.0"               "13.0"              
[31] "8.0"                "85.0"               "57.0"               "21.0"               "1012.0"            
[36] "0"                  "0"                  "1"                  "0"                  "43"                
[41] "3"                  "4"     

Data:

cat(string)
15059773^A3872^A\N^A2015-09-05^A\N^A2015-09-01^A3^A0^A0^A\N^Ashirts adult male^Axl^A\N^A5183656^Ac1 13 me ult tee c^Ablue^Awatersport blue^A\N^A\N^A\N^A0^A\N^A3^Amn^A45.05273^A-93.365555^A100^A131^A27.0^A13.0^A8.0^A85.0^A57.0^A21.0^A1012.0^A0^A0^A1^A0^A43^A3^A4
Psidom
  • 209,562
  • 33
  • 339
  • 356