0

The following strings are in an html file that is a subset of the strings I have to work with:

content/css/dashboard.css
content/pages/icon-apache.png
content/js/dashboard-commons.js
sbadmin2-1.0.7/bower_components/jquery/dist/jquery.min.js

I'm trying to remove all the path and only leave the file name, so it would be like this:

dashboard.css
icon-apache.png
dashboard-commons.js
jquery.min.js

I'm trying to find an approach that doesn't involve just getting all cases one by one and use sed to replace it, but a generic way to do it.

In short:

  • A regex to find the pattern (multi-level directory path) in the html file and remove it

Edit: I'm looking for a solution that works on linux, preferably that doesn't involves scripting or installing tools.

Edit 2: this question partially answers my question. With the answer provided there, I can now get the last part of the path. But I'm still looking for a regex pattern for extracting the list of strings from the html file.

Edit 3: As requested, here are a few examples:

<link href="sbadmin2-1.0.7/dist/css/sb-admin-2.css" rel="stylesheet">
<link href="content/css/dashboard.css" rel="stylesheet">
<link href="content/css/theme.blue.css" rel="stylesheet">
<script src="sbadmin2-1.0.7/bower_components/bootstrap/dist/js/bootstrap.min.js"></script>
<script src="sbadmin2-1.0.7/bower_components/flot/excanvas.min.js"></script>
<script src="sbadmin2-1.0.7/bower_components/flot/jquery.flot.js"></script>
luizfzs
  • 1,328
  • 2
  • 18
  • 34
  • 1
    Possible duplicate of [Get last field using awk substr](https://stackoverflow.com/questions/17921544/get-last-field-using-awk-substr) – kvantour Sep 28 '18 at 14:21
  • For the HTML question, you have to provide us with an example so we know where these strings come from. Are they part of or where do they come from. – kvantour Sep 28 '18 at 14:27
  • Why not think about removing what is not needed with an RE? For example with sed: `sed 's:.*/::'` – Thor Sep 28 '18 at 14:27
  • Also, you ask for a regex to parse your HTML. [**Never** parse HTML or XML with a regex](https://stackoverflow.com/a/1732454/8344060) you might meet the pony. – kvantour Sep 28 '18 at 14:29
  • @Thor that was my intention when asking the question. But I'm not familiar with sed/awk/grep to come up with the most appropriate regex for the job. – luizfzs Sep 28 '18 at 14:33

2 Answers2

1

from the full path

$ awk -F/ '{print $NF}' file

dashboard.css
icon-apache.png
dashboard-commons.js
jquery.min.js

from the html

$ awk -F'"' '/<link|script/{n=split($2,a,"/"); print a[n]}' file.html

sb-admin-2.css
dashboard.css
theme.blue.css
bootstrap.min.js
excanvas.min.js
jquery.flot.js

assumes one link/script tag per line.

karakfa
  • 66,216
  • 7
  • 41
  • 56
-2

You should use basename for that

J.F.

basename content/css/dashboard.css

gives

dashboard.css
J.F.
  • 60
  • 6
  • Sorry but I cannot see how that answers my question – luizfzs Sep 28 '18 at 14:14
  • basename content/css/dashboard.css gives you what you want dashboard.css – J.F. Sep 28 '18 at 14:15
  • Suppose I have a list with 100 of strings like this, and the base name does not repeat. Your suggestion is to have a 100 replace commands, one for each base name, right? If so, I stated that it is not what I'm looking for. – luizfzs Sep 28 '18 at 14:18
  • You can also pipe the data through `rev | cut -d/ -f1 | rev`. – Florian Weimer Sep 28 '18 at 16:06