-1

I'm interested in converting data from this text file into a format I could load into a MySQL Workbench database.

https://sbir.nasa.gov/SBIR/abstracts/17-1.html

I want to run some R code that will give me the name of the business after each line titled

"SMALL BUSINESS CONCERN: (Firm Name, Mail Address, City/State/ZIP, Phone)"

For example, I'm looking for an output that looks something like this:

Transition45 Technologies, Inc. ATSP Innovations

etc. That I could load into a database column.

Hope that makes sense, I'm relatively new to this. Thanks.

ebilk
  • 3
  • 3
  • 1
    Please _edit_ your question and show us a minimal sample of what you are trying to do. Your source file is messy and I'm not sure your current logic would work. Also, I probably wouldn't use R for this, I would use Java or maybe something like Perl. – Tim Biegeleisen Apr 25 '17 at 01:02

1 Answers1

0

You problem/question is not clear.

If I am correct, you want to extract address detail that written next after line "SMALL BUSINESS CONCERN: (Firm Name, Mail Address, City/State/ZIP, Phone)", right?. If so, then

url <- "https://sbir.nasa.gov/SBIR/abstracts/17-1.html"

abstracts_page <- readLines(url)
abstracts_page <- gsub("<.*?>", "", abstracts_page)
abstracts_page <- gsub("\\t+", "", abstracts_page)

address_header_index <- grep("SMALL BUSINESS CONCERN:", abstracts_page)

address_list <- lapply(address_header_index, function(i) {
  return(abstracts_page[(i + 2):(i + 6)])
})

address_list <- data.frame(do.call("rbind", address_list))

head(address_list)

#                                          X1                                   X2                   X3
# 1          Transition45 Technologies, Inc.                1739 North Case Street      Orange,&nbsp;CA
# 2                         ATSP Innovations                    60 Hazelwood Drive   Champaign,&nbsp;IL
# 3         Cornerstone Research Group, Inc.               2750 Indian Ripple Road      Dayton,&nbsp;OH
# 4 Interdisciplinary Consulting Corporation      5745 Southwest 75th Street, #364 Gainesville,&nbsp;FL
# 5                 CFD Research Corporation  701 McMillian Way Northwest, Suite D  Huntsville,&nbsp;AL
# 6           LaunchPoint Technologies, Inc.        5735 Hollister Avenue, Suite B      Goleta,&nbsp;CA

#            X4             X5
# 1 92865-4211  (714) 283-2118
# 2 61820-7460  (217) 417-2374
# 3 45440-3638  (937) 320-1877
# 4 32608-5504  (352) 283-8110
# 5 35806-2923  (256) 726-4800
# 6 93117-6410  (805) 683-9659
nurandi
  • 1,588
  • 1
  • 11
  • 20