0

I have a dataset that looks like this:

enter image description here

and I would like to split the Cloud column into two columns, one column for the letters and another column for only the numbers of each coding, the problem is that in some rows there is a combination of two or three codes (OVC32 is one code for example) per row. any help on how can I split this into just two columns is much appreciated

thelatemail
  • 91,185
  • 12
  • 128
  • 188
  • 2
    It is not clear about the expected output. Could you show. Perhaps `library(stringr); df1$numbers <- sapply(str_extract_all(df1$Cloud, "\\d+"), paste, collapse=' '); df1$letters <- sapply(str_extract_all(df1$Cloud, "\\D+"), paste, collapse=' ')` – akrun Sep 18 '17 at 03:46
  • the output I'm after is a data frame exactly like the one in the picture but with one extra column to contain only the numeric part of the codes in the Cloud column of my current data frame, many thanks – Paulo Felipe Lagos Sep 18 '17 at 03:55
  • So what would this one column be for row 2, with "few clouds at 9000, scatters 12000, and broken 15000"? – r2evans Sep 18 '17 at 04:08
  • The second row will be only for the height of each cloud layer, but I'm not sure how can I left only one code in the rows where there is more than one code. This dataset is made out of hourly observations so I'm not interested in changes within the hour buy still don't want to delete the entire row, only the second and third codes of the rows where there is more than one code. – Paulo Felipe Lagos Sep 18 '17 at 04:17

1 Answers1

0

You can separate Number and Letter from "Cloud" using like this:

Cloud <- c("BKN130", "FEW090 SCT120 BKN150", "FEW200", "BKN140", "BKN120 BKN190")

Cloud_Letter <- gsub("[[:digit:]]","",Cloud)
Cloud_Letter
[1] "BKN"   "FEW SCT BKN"   "FEW"    "BKN"      
[5] "BKN BKN" 

Cloud_Number <- str_extract_all(Cloud, "\\d+")
Cloud_Number
[[1]]
[1] "130"

[[2]]
[1] "090" "120" "150"

[[3]]
[1] "200"

[[4]]
[1] "140"

[[5]]
[1] "120" "190"
Santosh M.
  • 2,356
  • 1
  • 17
  • 29