3

I have a text file with following contents-

"\n\n\n\n\n\n\n\n\t\n\t\t\t\n\t\t\t\t\t\n\t\t\t\t\n\t\t\n\n\n\t\n\t\t\n\t\t\t\t
Hotline: +49 40-300 51 701\n\t\n\t\n\t
Languages\n\t\n\t\n\t\t\n\t\t\n\t\t
Travel plan \n\t\n\t\n\n\n\n\t\t\n\n\t\t\n\t\t\t\n\n\n\n\n\n\n\n\n\n\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\n\n\t\t\n\n\t\t\n\t\t\t\t
Book\t
Packages from € 59\n
\tAccommodation and arrival\n
\tMusical packages\n
\tMaritime packages\n\t
Hamburg for Families\n\t
Experience Hamburg & Culture\n\n\n\n\n\t
Hotels from € 24\n\t
Book online now!\n\t
Theme hotels\n\t
Hotels by location\n\t
Special Offers\n\t
Hotels from A-Z\n\t
Other accommodation\n\n\n\n\n\t
Tickets from € 8\n\tBook online now!\n\t
Musicals Hamburg\n\tHamburg maritime\n\t
Sightseeing tours & city walks\n\tMuseums & Exhibitions\n\tHamburg for Families\n\n\n\n\n\t
Hamburg CARD\n\tBook online now!\n\tAll benefits at a glance\n\tFrequently asked questions\n\n\n\n\n\t
Group trips\n\tBooking request\n\tHamburg Guides and theme walks\n\n\n\n\n\n\n\t\n\t\tOffer\n\n\t\t\n\n\t\t\n\n\t\t
Hamburg CARD\n\t\tFree travel by bus, rail and ferry (HVV) and up to 50% discount on more than 150 tourist...\n\n\t\n\t\n\t\t\n\t\t\t\n\t\t\t\t
from 10,50 EUR\n\t\t\t\n\t\t\n\n\t\n\n\n\n\n\n\n\tAttractions\tBest of Hamburg\n\t
Town Hall\n\tThe \"Michel\"\n\tSt. Pauli & Reeperbahn\n\t
Elbphilharmonie\n\tJungfernstieg\n\tMiniatur Wunderland\n\tTierpark Hagenbeck\n\t
All about the Alster\n\tBlankenese\n\n\n\n\n\tHamburg Maritime\n\t
Urbanshore Hamburg\n\tPort of Hamburg\n\tLandungsbrücken\n\tFish Market\n\tSpeicherstadt\n\tOn the Elbe\n\tHafenCity\n\tWillkomm-Höft\n\tÖvelgönne\n\n\n\n\n\tHistoric Hamburg\n\tThe Old Elbe Tunnel\n\t"

I want to split it on the \n. I tried

string.split("\n")
string.split('\n') 
string.split("""\n""") 
string.split("\\n")

Nothing of this seems to work. How do I get it done in scala?

Let's try
  • 1,044
  • 9
  • 20
Rasika
  • 387
  • 6
  • 19

3 Answers3

3

Split by \n, then \t, flatten, then remove empty strings.

var lines = Source.fromFile("/Users/rasika/Documents/example.txt").getLines.mkString

val result = lines.split("\\\\n").flatMap(_.split("\\\\t")).filter(_.nonEmpty).toList

Result

Hotline: +49 40-300 51 701
Languages
Travel plan
Book
Packages from € 59
Accommodation and arrival
Musical packages
Maritime packages
Hamburg for Families
Experience Hamburg & Culture
Hotels from € 24
Book online now!
Theme hotels
Hotels by location
Special Offers
Hotels from A-Z
Other accommodation
Tickets from € 8
Book online now!
Musicals Hamburg
Hamburg maritime
Sightseeing tours & city walks
Museums & Exhibitions
Hamburg for Families
Hamburg CARD
Book online now!
All benefits at a glance
Frequently asked questions
Group trips
Booking request
Hamburg Guides and theme walks
Offer
Hamburg CARD
Free travel by bus, rail and ferry (HVV) and up to 50% discount on more than 150 tourist...
from 10,50 EUR
Output exceeds cutoff limit.

Sebastian Celestino
  • 1,388
  • 8
  • 15
  • Instead of splitting, could use linesIterator. The following would have the same end result: `lines.replace("\t", "\n").linesIterator.filterNot(_.trim.isEmpty).toList` – Oswaldo Jun 04 '20 at 14:15
0

If you want to split on literal \n in your text (i.e. literal text, and not just a newline), then try this:

string.split("\\\\n")

In a regex context in Java/Scala, a literal backslash requires four backslashes.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • If I do this, I get all the text as a single line. I want every line of the text as an element in in a array. – Rasika Aug 13 '18 at 17:02
0

Since you're splitting on newlines, and io.Source.fromFile.getLines separates on newlines, you'll need to read the whole file in one go instead, with

val string = io.Source.fromFile(filepath).mkString

as per this answer. Then your attempts should work e.g.

string.split('\n')
joel
  • 6,359
  • 2
  • 30
  • 55