-1

I am currently involved in mathematics of machine learning (NLP to be precise). While on the task I have encountered a problem. I want to print out lines containing any of the following regexes:

1)fbchat

2)fb_timeline

3)Facebook Wall Post

into a separate text files, one for each string mentioned above.

Then in each of the resulting text files, I would like to sort each line with respect to the thread ID field of the Database mentioned in the very first line of messaged.dmp. I am a theoretical person with very less programming experience.

The download link to the database dump is given below

messages.dmp

Update:

This is the script I tried to write:

import re
from sys import argv

scrip, file_name = argv

dfile = open(file_name, 'r')

for line in dfile:
    if re.match("fbchat", line):
        print line

But the script performs nothing.

  • 2
    I understand that you are `a theoretical person with very less programming experience` but please refer to the [help](http://stackoverflow.com/tour) You can't ask `questions you haven't tried to find an answer for` you need to show your work. – Kobi K Aug 17 '14 at 13:28
  • @KobiK I have updated my question...pls go through – spaceman_spiff Aug 17 '14 at 13:49

1 Answers1

1

Given the following text file.txt:

text1
fbchat !
text2
Facebook Wall Post line

You can use the following code:

# open the file
with open('c:\\file.txt') as f:
    # read all lines as list
    lines = f.readlines()
# iterate over the list
for line in lines:
    # if any of the the strings in the list is in the line print it (using list comprehensions)
    if any(s in line for s in ['fbchat', 'fb_timeline', 'Facebook Wall Post']):
        # print but first remove new line character
        print line.strip("\n")

Output:

fbchat !
Facebook Wall Post line

You can read more about Python With, Python: List Comprehensions, Strip()

Kobi K
  • 7,743
  • 6
  • 42
  • 86
  • Thanx this worked .... but could you pls point out what mistakes were there in my script....that would be helpful as i intend to learn..............and thanks for sharing additional resources....its very hard for newbie to filter down enormous amount of material that google churns up for any reference query..... – spaceman_spiff Aug 17 '14 at 15:53
  • Glad it helped, You problem was with understanding `re.match()`, You can read this [tutorial](http://www.tutorialspoint.com/python/python_reg_expressions.htm) about regex and python it's short and easy, Also you can try [this](http://stackoverflow.com/q/13423624/1982962) post. – Kobi K Aug 17 '14 at 16:20
  • but i am unable to come up with a solution to the second part ....that is if i get all lines containing **fbchat** in a chat file...how will i be able to sort these lineS based on the **thread ID** field in them...pls go thru ...thank you – spaceman_spiff Aug 20 '14 at 15:48