0

I am reading the column of data.csv file and trying to extract desired text before the last forward slash from strings in column using regular expression. My column data looks like:

class:

org/apache/flume/api/virtual/loeadBalancing.java
org/apache/flume/file/Channel/testing/test2.java
org/apache/flume/recoverable/memory/test1.java
org/apache/flume/source/scribe/LogEntry.java
org/apache/flume/source/jms/TestJMSMessageConsumer.java

My desired output is:

org/apache/flume/ap/virtual
org/apache/flume/file/Channel/testing
org/apache/flume/recoverable/memory
org/apache/flume/source/scribe
org/apache/flume/source/jms/TestJMSMessageConsumer

So, basically, I am trying to extract sub string from class colum that excludes the text and backlash appearing after it. My current code is:

dfkg<- gsub( "\\.[^/]*$", "", data$class) 

Can some one correct my regular string to generate the desired output?

Analyzer
  • 79
  • 7

1 Answers1

1

We can match the / followed by one or more characters that are not a / ([^/]+) until the end of the string ($) and replace it with blank ("").

sub("/[^/]+$", "", data$class)
#[1] "org/apache/flume/api/virtual"          "org/apache/flume/file/Channel/testing" "org/apache/flume/recoverable/memory"  
#[4] "org/apache/flume/source/scribe"        "org/apache/flume/source/jms"      

In the OP's code

gsub( "\\.[^/]*$", "", data$class) 

it is matching a dot (\\.) followed by zero or more characters that are not a / ([^/]*) until the end of the string ($). So, basically it will first match the . at the .java followed by java which doesn't have any / and replace it with "".


Based on the comments by OP,

sub("\\.[^.]+\\.[^.]+$", "", 'org.apache.flume.api.virtualloeadBalancing.java' )
#[1] "org.apache.flume.api"
akrun
  • 874,273
  • 37
  • 540
  • 662
  • It works. Can you explain me the problem in my code? – Analyzer Nov 27 '16 at 04:53
  • 1
    Yes. It does work now. – Analyzer Nov 27 '16 at 04:56
  • Can you explain the trouble in my code. Just another correction is required, after getting this, how can I replace backslash with "." dot ? – Analyzer Nov 27 '16 at 04:56
  • @Analyzer Did you meant to have `"org.apache.flume.api.virtual"` In that case `gsub("[/]", ".", sub("\\/[^/]+$", "", data$class))` – akrun Nov 27 '16 at 04:58
  • Thanks. It works perfectly fine. However, I need to check my regular expression and understand the difference between sub and gsub. Sub is probably used to substring the text while gsub is more specific for replacing the text. – Analyzer Nov 27 '16 at 05:12
  • @Analyzer I updated the post about the problem in your code. `gsub` is for global substitution. It will be do the substitution more than one time and `sub` does it a single time. The intial problem can be solved with `sub` as this requires only a single instance of replacement – akrun Nov 27 '16 at 05:14
  • my corde works fine to extract every thing before the last '. ' appearing in the text like ' org.apache.flume.api.virtualloeadBalancing.java ' when executed with ' gsub( "\\.[^/]*$", "", data$class) '. Isnt it ? – Analyzer Nov 27 '16 at 05:19
  • @Analyzer It wil extract the `org/apache/flume/api/virtual/loeadBalancing` in your original `org/apache/flume/api/virtual/loeadBalancing.java` – akrun Nov 27 '16 at 05:20
  • I agree, but I am talking when my text is in the format 'org.apache.flume.api.virtualloeadBalancing.java ' that it should be able to extract 'org.apache.flume.api' , right? – Analyzer Nov 27 '16 at 05:23
  • @Analyzer It will match the `.` at the first instance i.e. org. and all other charactesr in `[^/]*$`, so you will get `org`. I am not sure why you are using the `/` as it not in the new string – akrun Nov 27 '16 at 05:25
  • @Analyzer if you need the substring you wanted, `sub("\\.[^.]+\\.[^.]+$", "", 'org.apache.flume.api.virtualloeadBalancing.java ' ) #[1] "org.apache.flume.api"` – akrun Nov 27 '16 at 05:27