1

I have a file like this

apple

ae-pal

noun.

a fruit

ball

b'al

noun.

playing material
round shaped

and so on. so it starts with word then a blank line and pronunceation ( I konw above ones are stupid ones :P ). then part of speech and meaning. after each term there is blank line. what I want finally is to do a recursive call so that it pics first word and places in one table in database (mysql, may be) and then second into corrosponding row of same table and so on.

First I wanted to number this spaces. like 1 2 3 4 and so on. so that I can put all 1, 5, 9 that is 2*x+1 in one place and 2*x in another which way I will reach to my point and i can push them into a database getting finally my dictionary.

I could find a way of replacing empty lines with a number but couldn't get to how I could make them increasing number. I wonder how this can be implemented using sed, awk, or even python. no doubt regex is going to be there.

pseudo code

is line empty ? 
   yes ? give a number  x (x =1)
   increase x by 1
   no ? go to next line
   repeat till eof.

I hope I am clear enough!

Chandan Gupta
  • 1,410
  • 2
  • 13
  • 29
  • To answer your question, use `enumerate(line for line in open(...) if line)`, although there are better ways to split up the file by batches. – Katriel Aug 15 '12 at 05:36

4 Answers4

2

This might work for you:

awk '/^$/{print ++c;next};1' file

or GNU sed:

touch /tmp/c
addone () { c=$(</tmp/c); ((c+=1)); echo $c | tee /tmp/c; }
export -f addone
sed '/^$/s//addone/e' file
rm /tmp/c

An alternative might be to turn all blank lines into tabs and every fourth tab into a newline.

sed ':a;$!{N;ba};s/\n\n/\t/g;y/\n/ /;' file | sed 's/\t/\n/4;P;D'
potong
  • 55,640
  • 6
  • 51
  • 83
1
(line for line in open(...) if line)

is an iterable over the non-empty lines of the file. Use this recipe to iterate over it in fours:

def grouper(iterable, n, fillvalue=None):
    args = [iter(iterable)] * n
    return izip_longest(*args, fillvalue=fillvalue)

nonempty_lines = (line for line in open(...) if line)
grouper(nonempty_lines, 4)
Community
  • 1
  • 1
Katriel
  • 120,462
  • 19
  • 136
  • 170
1

you can use iterable, as it only yields when next() is called

with open('data.txt') as f:
    lines=[x.strip() for x in f]
    spaces=lines.count('')   #count the number of empty lines
    odd_spaces=spaces//2+1   #odd lines 1,3,5,7...
    even_spaces=spaces-odd_spaces #even lines 2,4,6,...

    it=iter(range(1,spaces+1)) #create an iterable
    try:
        lines=[x if x!='' else next(it) for x in lines]  #if line is empty then call next(it)
    except StopIteration:
        pass
    for x in lines:
        print(x)

    fil=[4*x+1 for x in range(0,spaces+1) if 4*x+1<spaces] #4x+1
    print(fil)
    row=[lines[lines.index(x)-1] for x in fil]
    print(row)

    fil=[2*x+1 for x in range(0,spaces+1) if 2*x+1<spaces] #2x+1
    print(fil)
    row=[lines[lines.index(x)-1] for x in fil]
    print(row)

output:

apple
1
ae-pal
2
noun.
3
a fruit
4
ball
5
b'al
6
noun.
7
playing material
round shaped
[1, 5]
['apple', 'ball']
[1, 3, 5]
['apple', 'noun.', 'ball']
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
  • I am now thinking how to insert them into datbase. each occurence of instance should go to one row. for eg "apple" and "ball" here should go in one row. htat is the 4*x+1 th line elements. any suggestions ? – Chandan Gupta Aug 15 '12 at 06:05
  • @CandyGupta I edited my solution and added `2x+1` and `4x+1`, I guess this is what you wanted. – Ashwini Chaudhary Aug 15 '12 at 06:19
1

Why don't you just run a loop counting the blank lines and then insert into the database is regex an importance ?

Here you go, a quick and dirty implementation in php

<?php

$filename = $argv[1];

if(file_exists($filename) && is_readable($filename)) {

    $fh = fopen ($filename, "r");
    $count = 0;
    $el = 0;
    $items = array();
    while(!feof($fh)) {
        $line = fgets($fh);
        if($line == "\n")
        {
            $count++;
            if($count == 4)
            {
                $el ++;
                $count = 0;
            }
            continue;
        }
        $items[$el][$count] .= $line;
    }
    fclose($fh);
}
var_dump($items);

?>

run it in the command line as php script.php filename This is what i got

array(4) {
  [0] =>
  array(4) {
    [0] =>
    string(6) "apple\n"
    [1] =>
    string(7) "ae-pal\n"
    [2] =>
    string(6) "noun.\n"
    [3] =>
    string(8) "a fruit\n"
  }
  [1] =>
  array(4) {
    [0] =>
    string(5) "ball\n"
    [1] =>
    string(5) "b'al\n"
    [2] =>
    string(6) "noun.\n"
    [3] =>
    string(30) "playing material\nround shaped\n"
  }
  [2] =>
  array(4) {
    [0] =>
    string(5) "pink\n"
    [1] =>
    string(7) "pe-ank\n"
    [2] =>
    string(6) "color\n"
    [3] =>
    string(14) "girlish\ncolor\n"
  }
  [3] =>
  array(1) {
    [0] =>
    string(0) ""
  }
}
avk
  • 994
  • 6
  • 13
  • the meaning of word which is fourth instance here is not just one line. its in multiple lines without being followed by blank lines. and those needs to be put as one entry in database. I am not sure how to get in there. – Chandan Gupta Aug 15 '12 at 06:04
  • Well after the multi-lined fourth instance you do have a empty line right ? and you can check if the line is empty or not? – avk Aug 15 '12 at 06:07