1

Possible Duplicate:
Reading Microsoft word document in iphone

wondering if there is some sort of objC library that I could use for parsing/reading word documents so i can convert it to txt files for more data processing.

Community
  • 1
  • 1
n3rfd
  • 275
  • 2
  • 11

1 Answers1

1

If all you need from a Word document is the plain text, that's pretty easy.

Assume you have an NSData filled with data from a Word .doc...

Read a UInt32 from the data, at byte index 536. This number, plus 512, is the byte index where the text starts. (It usually starts at 2048, but not always.)

Read another UInt32 from byte index 588 in the data. This number is how many characters are in the text.

Make a range out of those two UInt32s and then read the text from that range in the data.

UInt32 fcMin;
[data getBytes:&fcMin range:NSMakeRange(536, sizeof(UInt32))];
UInt32 ccpText;
[data getBytes:&ccpText range:NSMakeRange(588, sizeof(UInt32))];
NSData *textData = [data subdataWithRange:NSMakeRange(fcMin + 512, ccpText)];
NSString *textContent = [[NSString alloc] initWithData:textData encoding:NSUTF16LittleEndianStringEncoding];
Marty Cullen
  • 210
  • 1
  • 7
iii
  • 1,594
  • 1
  • 9
  • 8
  • i'm getting an error at the getBytes:range call saying unrecognized selector sent to intance... – n3rfd Jan 15 '12 at 15:23
  • @pneftali Are you sure you're calling that on an NSData instance? And all the arguments are the same types I used? – iii Jan 16 '12 at 08:04
  • Any difference between DOC or DOCX format ? – Raptor Mar 13 '12 at 09:45
  • @ShivanRaptor Yes, the DOC and DOCX format are completely different. DOCX is actually a zip file containing a series of XML files. – iii Mar 13 '12 at 22:30