10

I have a large text file with content set up like this:

---
title: Lorim Ipsum Dolar
---
Lorim ipsum content
---
title: Excelvier whatever 
---
Lorim ipsum content goes here.

I'm trying to split up this file into individual files using csplit.

The individual files would have content formatted like this:

---
title: Lorim Ipsum Dolar
---
Lorim ipsum content

I was hoping to be able to regex the ---, newline & title like so ---\ntitle

But I'm not able to select it with…

csplit -k products.txt '/---[^\n]title/' {99}

I've tried lots of variations to no avail. I keeping getting "no match".

Philip Meissner
  • 163
  • 1
  • 1
  • 10
  • I don't know about `csplit`, but have you tried `/---[\r\n]+title/` ? (`[^ ...]` is a negated class and sometimes, there are carriage returns together with newlines). – Jerry Aug 21 '13 at 18:01

5 Answers5

7

You could use a regular expression that matches until the end of the line ($)

What do you think about:

csplit -k products.txt '/^title:/' {99}
inthenite
  • 116
  • 3
6

csplit reads the input file one line at a time and applies the regex to each line. It is therefore not possible to match a regex across multiple lines.

One way around this is to massage the input file first, replacing ---\ntitle: with a single line pattern that csplit can match. For example, using sed:

sed 'N;s/---\ntitle: /===\n' products.txt | csplit -k - '/===/' {*}
sed 'N;s/===\n/---\ntitle: /' -i xx*

This replaces ---\ntitle: with a single line ===, then has csplit split when it sees that pattern. Passing - as a file name tells csplit to read from stdin. The second sed command reverses the change.

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
2

Try using {*} instead of {99} to fix match not found problem.

Aleks-Daniel Jakimenko-A.
  • 10,335
  • 3
  • 41
  • 39
  • 1
    I believe the `{99}` just tells `csplit` how many time to repeat the process. Anyway, in my version of BSD `{*}` would not work. See http://stackoverflow.com/questions/4323703/looking-for-correct-regular-expression-for-csplit#comment-25322197 – Philip Meissner Aug 21 '13 at 18:13
  • @PhilipMeissner This is very interesting. Under debian csplit will always try to find specified number of matches, if it cannot find 99 matches, then an error will be thrown. ``csplit --version`` says ``csplit (GNU coreutils) 8.21`` – Aleks-Daniel Jakimenko-A. Aug 21 '13 at 18:17
1

This might work for you:

csplit -z products.txt '/^title/-1' '{*}'
potong
  • 55,640
  • 6
  • 51
  • 83
0

For me, the answer was don't use csplit, use awk.

awk '
/^title:/ {++count; file="file"count".txt"; print file}
file {print line > file}
{line=$0}
' products.txt

The first command declares a new file when title: is encoutered. The second command writes the preceding line to file if file has been declared. The third command assigns the current line to a variable.

Luke Davis
  • 2,548
  • 2
  • 21
  • 43