0

The transcriptions of the COSINE language corpus look as follows:

File type = "ooTextFile"
Object class = "TextGrid"

xmin = 0 
xmax = 3931.56874994773
tiers? <exists> 
size = 8
item []:
    item [1]:
        class = "IntervalTier"
        name = "Phrases"
        xmin = 0
        xmax = 3931.56874994773
        intervals: size = 1938
        intervals [1]:
            xmin = 0
            xmax = 3.59246613841739
            text = "Good morning"
        intervals [2]:
            xmin = 3.59246613841739
            xmax = 3.77632771424237
            text = "the dog likes me"
        intervals [3]:
            xmin = 3.77632771424237
            xmax = 8.15464058223137
            text = "fish swim"
        intervals [4]:
            xmin = 8.15464058223137
            xmax = 8.53678424963039
            text = "Sure."
        intervals [5]:
            xmin = 8.53678424963039
            xmax = 9.54622035219737
            text = "Just keep swimming"

The files are in .TextGrid format. How could one go ahead to extract the variables xmin, xmax and text for each of the intervals?

EDIT:

The file type can be treated as a normal text file and read line by line. Which was my solution to the problem. It would still be interesting to know if there is a special way to extract information from these type of files. Thank you for the responses.

ishido
  • 4,065
  • 9
  • 32
  • 42

2 Answers2

1

I haven't worked with textGrid files before see if this helps you. If it doesn't its very easy to write you own function to dot this. looking at textGrid file and the sample file here it appers there is a set format for these files.

• line 1 and 2 -> file information

• line 3 - > blank, a separator

• line 4 - 7 -> some other information

also line 7 indicates the size, or the number of items in your file.

We can reconstruct these data into a variable as the following:

enter image description here

See this for more about combining dictionaries and lists.

I suggest you to do the following:

Read the file line by line. Do as desired to the information in the first 7 lines. At the 8th line create the item array then you can check the presence of 'item[x], class,name,xmin,xmax, intervals: size, intervals' and assign them to the relevant place of the list/dict. See this link it describes well about data structures if you are not much familiar.

then you can retrieve the values as

list[itemNumber]['class ']

or

list[itemNumber]['intervals'][intervalNumber-1]['xmin'] #index starts from 0

so on...

Hope this helps. Please feel free to comment if you need any further help.

Community
  • 1
  • 1
Kaveen Perera
  • 414
  • 3
  • 12
  • Hi thanks for the response. I went ahead and saved it as a normal text file, manually deleting parts I don't need and it worked out through use of a bunch of for loops :) – ishido Jan 17 '17 at 20:38
  • Great. If that is just a one time job that's quicker. – Kaveen Perera Jan 17 '17 at 20:42
0

you can write a python script to do it. What I did was

with open('file.Textgrid','r') as f:
  data = f.read()
#print data #Use this to view how the code would look like after the program has opened the files
txttext = ''
for lines in data[9:]:  #informations needed begin on the 9th lines
  line = re.sub('\n','',line) #as there's \n at the end of every sentence.
  line = re.sub ('^ *','',line) #To remove any special characters
  linepair = line.split('=')
  if len(linepair) == 2:
    if linepair[0] == 'xmin':
       xmin == linepair[1]
    if linepair[0] == 'xmax':
       xmax == linepair[1]
    if linepair[0] == 'text':
       if linepair[1].strip().startswith('"') and linepair[1].strip().endswith('"'):
         text = linepair[1].strip()[1:-1]
         txttext += text + '\n'  

And yeah, save the txtext into a txt file by using write() function and youre good.

stranger
  • 134
  • 1
  • 7