0

when i search keyword "data ", i get abtract of paper in digital library :

Many organizations often underutilize their existing <span class='snippet'>data</span> warehouses. In this paper, we suggest a way of acquiring more information from corporate <span class='snippet'>data</span> warehouses without the complications and drawbacks of deploying additional software systems. Association-rule mining, which captures co-occurrence patterns within <span class='snippet'>data</span>, has attracted considerable efforts from <span class='snippet'>data</span> warehousing researchers and practitioners alike. Unfortunately, most <span class='snippet'>data</span> mining tools are loosely coupled, at best, with the <span class='snippet'>data</span> warehouse repository. Furthermore, these tools can often find association rules only within the main fact table of the <span class='snippet'>data</span> warehouse (thus ignoring the information-rich dimensions of the star schema) and are not easily applied on non-transaction level <span class='snippet'>data</span> often found in <span class='snippet'>data</span> warehouses

How can i remove all tag <span class='snippet'>..</span>, but still keep keywod data to have abtract like that :

Many organizations often underutilize their existing data warehouses. In this paper, we suggest a way of acquiring more information from corporate data warehouses without the complications and drawbacks of deploying additional software systems. Association-rule mining, which captures co-occurrence patterns within data, has attracted considerable efforts from data warehousing researchers and practitioners alike. Unfortunately, most data mining tools are loosely coupled, at best, with the data warehouse repository. Furthermore, these tools can often find association rules only within the main fact table of the data warehouse (thus ignoring the information-rich dimensions of the star schema) and are not easily applied on non-transaction level data often found in data warehouses

Trufa
  • 39,971
  • 43
  • 126
  • 190
tiendv
  • 2,307
  • 7
  • 23
  • 34
  • Is it always going to be ``? You can use a simple string replace or regex. – Marko Oct 20 '10 at 03:45
  • If any kind of HTML can be present, i would suggest you use a parser instead of a regex. Check out this wiki if you want a good parser...http://stackoverflow.com/questions/773340/can-you-provide-an-example-of-parsing-html-with-your-favorite-parser – Jagmag Oct 20 '10 at 03:50
  • re: regex and HTML... Thar be Dragons. – Tony Ennis Oct 20 '10 at 03:53
  • @Marko: no It is sample, it difference by keyword search – tiendv Oct 20 '10 at 04:01
  • Yes yes I know regex and HTML don't make a great couple, at least one question gets asked on SO about stripping tags in HTML using C#, and the answer is always the HtmlAgilityPack. This is why I asked if it's a single occurence, where a `String.Replace` would've been enough. – Marko Oct 20 '10 at 04:06
  • Possible duplicate: http://stackoverflow.com/questions/240546/removing-html-from-a-java-string – oksayt Oct 20 '10 at 04:24

1 Answers1

2

strip_tags() is your friend. Code kindly copied from here.

  public static String strip_tags(String text, String allowedTags) {
      String[] tag_list = allowedTags.split(",");
      Arrays.sort(tag_list);

      final Pattern p = Pattern.compile("<[/!]?([^\\\\s>]*)\\\\s*[^>]*>",
              Pattern.CASE_INSENSITIVE);
      Matcher m = p.matcher(text);

      StringBuffer out = new StringBuffer();
      int lastPos = 0;
      while (m.find()) {
          String tag = m.group(1);
          // if tag not allowed: skip it
          if (Arrays.binarySearch(tag_list, tag) < 0) {
              out.append(text.substring(lastPos, m.start())).append(" ");

          } else {
              out.append(text.substring(lastPos, m.end()));
          }
          lastPos = m.end();
      }
      if (lastPos > 0) {
          out.append(text.substring(lastPos));
          return out.toString().trim();
      } else {
          return text;
      }
  }
Frankie
  • 24,627
  • 10
  • 79
  • 121