0

I need to extract values inside first parenthesis from a txt.gz file while reading it using pandas read_csv method.

The records inside txt.gz looks like below one

2022-02-20 00:00:10.061 INFO  [140191547254528] - ocm_input [14] sending Requestcom.com.sdcm.RequestScore("6016293021","JKT","ID","AP","SUB","ID","AP","FOIGTW",1,"","EWF","ID","national",11163.568125276915,.5,.24,0,0,0,"JKT","SUB","","","OISS INDONESIA","N",0,1,"FOCIDIGTW","","","jms:WebSphere_MQ-default-sender")..............context(1,"main",true)

I'm looking for something like this example but for txt.gz file. In this example the values are being extracted from a string using StringIO but I need to do that from a .gz file. Also looking for best possible way if any.

What I'm looking for is to get the below values in pandas dataframe.

("6016293021","JKT","ID","AP","SUB","ID","AP","FOIGTW",1,"","EWF","ID","national",11163.568125276915,.5,.24,0,0,0,"JKT","SUB","","","OISS INDONESIA","N",0,1,"FOCIDIGTW","","","jms:WebSphere_MQ-default-sender")

What I'm looking for :

pd.read_csv(gzfilepath, compression='gzip',header=None, sep='\s*\(', quotechar='"', names=column_names,nrows=1000)

But I'm unable to read it that way. All I want is to extract values inside parenthesis while reading the .gz file

Pyd
  • 6,017
  • 18
  • 52
  • 109
  • What code do you have so far? It looks like you already have read the file and the problem is just extracting the part between the parentheses. You can use the `re` module for that. – Jan Wilamowski Mar 02 '22 at 04:46
  • please be more clear, you only want manipulate the file name, or the content of `txt.gz`, and try to give some examples. – Lei Yang Mar 02 '22 at 04:56
  • @LeiYang, question edited. I want the content of the file the values inside parenthesis read in pandas dataframe. – Pyd Mar 02 '22 at 05:29
  • @JanWilamowski, I want to do that while reading the file in `read_csv` using `BytesIO` or `StringIO` – Pyd Mar 02 '22 at 05:33
  • I would advise against using `read_csv` to read something that clearly isn't a CSV file. Fix the input, then read it in normally. – Jan Wilamowski Mar 02 '22 at 05:36
  • @JanWilamowski, I can read `.gz` using `read_csv` but looking for any direct options. – Pyd Mar 02 '22 at 05:39
  • 1
    You mean like described here? https://stackoverflow.com/questions/10566558/python-read-lines-from-compressed-text-files – Jan Wilamowski Mar 02 '22 at 05:46
  • @JanWilamowski, yes I will use this way if there is no direct option from `read_csv` – Pyd Mar 02 '22 at 05:55

0 Answers0