Find specific words in a text file and print the ouput

Question

I need help on python code

I have a text file with around 10 lines of paragraph. Below is an example

"406%2C318482214%2C318484497%2C318486317%2C318484676%2C318483611%2C318484802%2C318487059%2C318489974%2C318482672%2C318475417&tag_for_child_directed_treatment=0&_c_csdk_npa_o=false&_c_req_tfua=false&_c_req_npa=true&npa=1&tfua=0&guci=0.0.0.0.0.0.0.8&rbv=1&u_w=839&u_h=424&msid=com.ea.gp.test&_package_name=com.ea.gp.test&an=30178.android.com.ea.gp.test&net=wi&u_audio=3&u_so=l&preqs_in_session=0&support_transparent_background=true"

How can i write a python code to find the value assigned to the specific word "npa" & "tfua". Lets assume the highlighted once in the above text. It might happen that those words repeat multiple times also.

using re https://docs.python.org/3/library/re.html – May 12 '20 at 17:47 — , May 12 '20 at 17:47

mattyx17 · Answer 1 · 2020-05-27T23:07:22.883

1

Suppose your string is myStr

myStr = "406%2C318482214%2C318484497%2C318486317%2C318484676%2C318483611%2C318484802%2C318487059%2C318489974%2C318482672%2C318475417&tag_for_child_directed_treatment=0&_c_csdk_npa_o=false&_c_req_tfua=false&_c_req_npa=true&npa=1&tfua=0&guci=0.0.0.0.0.0.0.8&rbv=1&u_w=839&u_h=424&msid=com.ea.gp.test&_package_name=com.ea.gp.test&an=30178.android.com.ea.gp.test&net=wi&u_audio=3&u_so=l&preqs_in_session=0&support_transparent_background=true"
myLst = [lst.split("=") for lst in myStr.split("&")]
myDict = {lst[0] : lst[1] for lst in myLst}

Now myDict["npa"] gives the desired value. Likewise for myDict["tfua"]

If you are sure you only want those two you could restrict the dict to only those values:

myLst = [lst.split("=") for lst in myStr.split("&")]
myDict = {lst[0] : lst[1] for lst in myLst if lst[0] in ["npa", "tfua"]}

edited May 27 '20 at 23:07

answered May 12 '20 at 17:52

mattyx17

806
6
11

i tried the one which you posted f = open('1.txt', 'r') read = f.readlines() myLst = [lst.split('=') for lst in myStr.split("&")] myDict = {lst[0] : lst [1] for lst in myLst} print(myLst) print(myDict) got this error Traceback (most recent call last): File "D:/1.py", line 3, in myLst = [lst.split('=') for lst in myStr.split("&")] NameError: name 'myStr' is not defined – Smart Techie May 16 '20 at 10:55
@SmartTechie yes that is because I was assuming your string is called `myStr`. If you change `read = f.readlines()` to `myStr = f.readline()` then it should work. – mattyx17 May 16 '20 at 20:27
Hi. Thanks for looking into this. I tried like this f = open('3.txt','r') myStr = f.readlines() myLst = [lst.split("=") for lst in myStr.split("&")] myDict = {lst[0] : lst[1] for lst in myLst if lst[0] in ["npa", "tfua"]} im getting this error "list' object has no attribute 'split'" . the problem is that there are multiple lines and not a single line. – Smart Techie May 27 '20 at 18:09
This is just the case for one line. If you want for every line you would have to do it in a loop – mattyx17 May 27 '20 at 23:08

score 1 · Answer 2 · answered May 12 '20 at 17:57

1

Using the re module

import re
x = "406%2C318482214%2C318484497%2C318486317%2C318484676%2C318483611%2C318484802%2C318487059%2C318489974%2C318482672%2C318475417&tag_for_child_directed_treatment=0&_c_csdk_npa_o=false&_c_req_tfua=false&_c_req_npa=true&npa=1&tfua=0&guci=0.0.0.0.0.0.0.8&rbv=1&u_w=839&u_h=424&msid=com.ea.gp.test&_package_name=com.ea.gp.test&an=30178.android.com.ea.gp.test&net=wi&u_audio=3&u_so=l&preqs_in_session=0&support_transparent_background=true"

npa=re.findall('(?<=npa=)(.*?)(?=&)', x)
tfua=re.findall('(?<=tfua=)(.*?)(?=&)', x)


print(npa)
>>> ['True','1']

print(tfua)
>>> ['False','0']

answered May 12 '20 at 17:57

Antscloud

33
1
5

Thanks for the code. i have a doubt, my txt contains around > 500 lines. How can you the text file in the above code. i tried f = open('1.txt', 'r') read = f.readlines() and it is throwing an error. Traceback (most recent call last): File "D:/1.py", line 6, in npa=re.findall('(?<=npa=)(.*?)(?=&)', read) return _compile(pattern, flags).findall(string) TypeError: expected string or bytes-like object – Smart Techie May 16 '20 at 11:00
i only want the value 1 and 0 in the result. – Smart Techie May 16 '20 at 11:03
Try to do : `str(read) ` in the `findall` method. And if you only want 0 or 1. You can just type `int(npa[0])`. Because when you execute `int()` on boolean it return 0 or 1 depending on value (false or true). – Antscloud May 18 '20 at 08:00
Thanks that worked. how can i get my output a string. For example. if above lines contains &cust_params=excl_cat%26optout%3Dno%26coppa%3Dno&. i want the text between "=" and "&". the string is not the same all the time so we cant count the letters. – Smart Techie May 27 '20 at 17:54
If you just want all differents contents between `=` and `&` you can type `re.findall('(?<==)(.*?)(?=&)', x)`. However, if you want bewteen a specific word and `=` you have to type `re.findall('(?<=yourword=)(.*?)(?=&)', x)` where `yourword` is the word you're searching for. – Antscloud May 27 '20 at 21:26
Thanks much. i'm able to find the required fields. – Smart Techie May 31 '20 at 14:42

Find specific words in a text file and print the ouput

2 Answers2