Labelling text using Notepad++ or any other tool

Question

I have several .dat, containing information about hotel reviews as below
/*
<Author> simmotours
<Content> review......goes here
<Date>Nov 18, 2008
<No. Reader>-1
<No. Helpful>-1
<Overall>4`enter code here`
<Value>4
<Rooms>3
<Location>4
<Cleanliness>4
<Check in / front desk>4
<Service>4
<Business service>-1

*/ I want to classify the review into two pos and neg , i.e. have two folder pos and neg containing several files with reviews above 3 classified as positive and below 3 classified as negative.

How can I quickly and efficiently automate this process?

Does the file you have look like what you indicate or is it a proper XML file? — z--, Jul 05 '14 at 07:09

score 0 · Accepted Answer · edited May 23 '17 at 10:26

0

You could write up a python script to read the overall score. Do this by looping over the the lines using readline() See here. Find the "Overall" Score using some string parsing. Then move the file into the right directory. All very simple things to do in Python, just break it down into steps and search for answers to those steps.

edited May 23 '17 at 10:26

Community

1
1

answered Jul 03 '14 at 11:56

blsmit5728

434
3
11

I was thinking of converting the above format to XML by adding , etc and then Parse using some XML parser. But I am blocked on how we could append. i.e. search for * and replace it with * <\Author> – user3801185 Jul 04 '14 at 14:04
@user3801185 simple search/replace of `^<(\w+)>(.*)$` with `<\1>\2\1>`, assuming the lines are as in the example and have no embedded `<` or `>`. But would need to have previously changed `` and others with non-alphanumeric to valid tags. – AdrianHHH Jul 07 '14 at 15:19

z-- · Answer 2 · 2014-07-07T12:52:33.517

0

Notepad++ can do replacements with regular expressions. And allows the definition of macros. Use them to convert the file to an XML file. Check out the help file.

Then you can read it with any scripting language and do what you want.

Alternatively you could change the file to a form where you can load it into Excel and do the analysis there.

edited Jul 07 '14 at 12:52

answered Jul 07 '14 at 12:29

z--

2,186
17
33

Labelling text using Notepad++ or any other tool

2 Answers2