Removal of data after first occurence of string in R

Question

I have a data in columns in dataframes as

ROMANIA ~ ROMANIA ~ ROMANIA ~ 0 ~ 0 ~ 0 ~ 0 ~ 0 ~ 0 ~ 0
SWITZERLAND ~ RUSSIAN FEDERATION ~ 0 ~ 0 ~ 0 ~ 0 ~ 0 ~ 0 ~ 0 ~ 0  
INDIA ~ 0 ~ 0~ 0 ~ 0 ~ 0 ~ 0 ~ 0 ~ 0 ~ 0

and many more rows.

I want to remove data after first occurrence of zero. So final output look like

ROMANIA ~ ROMANIA ~ ROMANIA
SWITZERLAND ~ RUSSIAN FEDERATION
INDIA

Are `~` characters actually in the dataframe column? Or are you trying to show different columns? Is it one column or 10 columns? — zx8754, Jul 11 '18 at 12:29
Please make your input data [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). — zx8754, Jul 11 '18 at 12:30
@Romil i misinterpreted your data, assuming it were strings. What do you want as replacement for the 0 ? Is NA, then just use: `df[df == 0] <- NA` — Wimpel, Jul 11 '18 at 12:45

score 1 · Answer 1 · edited Jul 11 '18 at 12:41

Use gsub to replace everything after the first occurrence of " ~ 0" (including that " ~ 0 "), with "" (=nothing)

v <- c("ROMANIA ~ ROMANIA ~ ROMANIA ~ 0 ~ 0 ~ 0 ~ 0 ~ 0 ~ 0 ~ 0",
       "SWITZERLAND ~ RUSSIAN FEDERATION ~ 0 ~ 0 ~ 0 ~ 0 ~ 0 ~ 0 ~ 0 ~ 0",
       "INDIA ~ 0 ~ 0~ 0 ~ 0 ~ 0 ~ 0 ~ 0 ~ 0 ~ 0" )

gsub(" ~ 0.*", "", v)

#[1] "ROMANIA ~ ROMANIA ~ ROMANIA"      "SWITZERLAND ~ RUSSIAN FEDERATION" "INDIA"

score 1 · Answer 2 · answered Jul 11 '18 at 13:09

data:

library(magrittr)
df <- data.table::fread("
ROMANIA ~ ROMANIA ~ ROMANIA ~ 0 ~ 0 ~ 0 ~ 0 ~ 0 ~ 0 ~ 0
SWITZERLAND ~ RUSSIAN FEDERATION ~ 0 ~ 0 ~ 0 ~ 0 ~ 0 ~ 0 ~ 0 ~ 0  
                  INDIA ~ 0 ~ 0~ 0 ~ 0 ~ 0 ~ 0 ~ 0 ~ 0 ~ 0",header=F,sep="~") %>% as.data.frame
#            V1                 V2      V3 V4 V5 V6 V7 V8 V9 V10
# 1     ROMANIA            ROMANIA ROMANIA  0  0  0  0  0  0   0
# 2 SWITZERLAND RUSSIAN FEDERATION       0  0  0  0  0  0  0   0
# 3       INDIA                  0       0  0  0  0  0  0  0   0

code:

df[,sapply(df,function(x)as.numeric(x) %>% {sum(.==0,na.rm=T) != length(x)})]

result:

#           V1                 V2      V3
#1     ROMANIA            ROMANIA ROMANIA
#2 SWITZERLAND RUSSIAN FEDERATION       0
#3       INDIA                  0       0

score 0 · Answer 3 · answered Jul 11 '18 at 12:48

0

Since you haven't provided the sample data correctly so I couldn't completely tested it, try following once.

as.data.frame(lapply(df, function(y) gsub("~ 0.*", "", y)))

answered Jul 11 '18 at 12:48

RavinderSingh13

130,504
14
57
93

Removal of data after first occurence of string in R

3 Answers3