-1

System: Windows 10, R 3.6.2

I import the data from an EXCEL file into a data.frame. One variable has values like this:

sample data

What I want is to extract the data before the first "\", and create a new variable. I tried split, str.split, str_extract, and gsub, and none of them works. I think the main problem is the separative sign, but I still don't know how to work around. I really appreciate if anyone can help me with this.

Ryan
  • 55
  • 9
  • 1
    Try `sub("\\\\.*", "", df$account)` – Ronak Shah Feb 28 '20 at 06:19
  • 1
    Another option is `sub("[\\].*", "", df$account)`. – Rui Barradas Feb 28 '20 at 06:36
  • thank you guys very much, it do works. But why there's 4 "\"? As I checked, I should add 2 "\\"before "\" to stand for "\". – Ryan Feb 28 '20 at 07:07
  • You must have 2 `"\\"`. So one of them becomes `"\\"` and the other also becomes `"\\"`. – Rui Barradas Feb 28 '20 at 07:23
  • 1
    Why 4? You do need to escape the backspace. try doing `nchar('\\')` and see how many characters you have. Now you need to escape it so that it is literal. ie you need 2 backslashes as you said. Can you think of a way to make `nchar('\\')` give you 2 instead of 1? – Onyambu Feb 28 '20 at 08:03

2 Answers2

1

Since you want to extract the first four characters in the string, which come before the "\" sign. One solution is to load the stringr library, and extract the substring.

library(stringr)
str_sub(string, 1, 4)

Hope it helps!

0

You could use sub and remove everything after first backslash.

sub("\\\\.*", "", df$account)

Another option is to capture everything before first backslash.

sub("(.*?)\\\\.*", "\\1", df$account)

Regarding why you need 4 "\", read How to escape backslashes in R string .

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • thank you very much. But what I mean is that, as I checked, "\\" stand for "\", so what I inputed is "\\\*", where "\\" stand for "\", and "*" to include everything behind "\". But it doesn't work. Why there has to be "\\."? would you please help me? – Ryan Mar 02 '20 at 00:56
  • 1
    @Ryan To include everything behind "\" you need to use `.*` and not only `*`. – Ronak Shah Mar 02 '20 at 01:02
  • Thank you very much, you do help me a lot! – Ryan Mar 02 '20 at 01:04