What is the correct way to read txt file using command line in Pandas

Question

I am new to python and I am having error like this with my code, which is to scan IP address list and show only the malware IP lists: import os from datetime import datetime, date, timedelta import subprocess import pyjq import pandas as pd

# Initializes the variables for the directories
HomeDir = "/Users/mani/Downloads/"
ScriptDir = HomeDir + "/panpython"
ResultDir = "/Users/mani/Desktop/result"

# Create the dates
ToDay = datetime.now().strftime('%Y%m%d')
# checkDATE = (date.today() - timedelta(1)).strfttime('%Y%m%d')
ResultFile = "Test"
CheckDATE = "2015-10-01"
NOWDATE = "2015-10-02"

secretkey = 'secret key'

progToRun = 'python ' + ScriptDir + '/bin/panafapi.py -K ' + secretkey + ' --samples -j -r "{\\"query\\":{\\"operator\\":\\"all\\",\\"children\\":[{\\"field\\":\\"alias.ip_address\\",\\"operator\\":\\"contains\\",\\"value\\":\\"' + ResultFile + '\\"},{\\"operator\\":\\"any\\",\\"children\\":[{\\"field\\":\\"sample.update_date\\",\\"operator\\":\\"is in the range\\",\\"value\\":[\\"' + CheckDATE + 'T00:00:00\\",\\"' + NOWDATE + 'T23:59:59\\"]},{\\"field\\":\\"sample.create_date\\",\\"operator\\":\\"is in the range\\",\\"value\\":[\\"' + CheckDATE + 'T00:00:00\\",\\"' + NOWDATE + 'T23:59:59\\"]},{\\"operator\\":\\"any\\",\\"children\\":[{\\"field\\":\\"sample.malware\\",\\"operator\\":\\"is\\",\\"value\\":1},{\\"field\\":\\"sample.malware\\",\\"operator\\":\\"is\\",\\"value\\":4}]}]}]},\\"scope\\":\\"global\\",\\"size\\":1,\\"from\\":0,\\"sort\\":{\\"create_date\\":{\\"order\\":\\"desc\\"}}}" > ' + ResultDir + 'srciplist-' + ToDay + '.json'

# Run the panafpi
subprocess.check_output(progToRun, shell=True)

# Using pyjq to filter
filteredResultData = pyjq.all('.hits[]._source | .create_date + "," + .sha256')
file_to_open=sys.argv[1]
df=pd.read_csv(file_to_open)
df.to_csv(ResultDir + "/srciplist-" + ToDay + ".csv", sep=',')

I did this in command line, then it scans and show this error

python finalauto.py xyz.txt
samples_search: 200 OK 339 0%
.......
samples_results: 200 OK 100% hits=1 total=674415 time=0:08:57.082 "complete"

error:

   Traceback (most recent call last):
  File "finalauto.py", line 36, in <module>
    df=pd.read_csv(file_to_open)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 709, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 455, in _read
    data = parser.read(nrows)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1069, in read
    ret = self._engine.read(nrows)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1839, in read
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 902, in pandas._libs.parsers.TextReader.read
  File "pandas/_libs/parsers.pyx", line 924, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas/_libs/parsers.pyx", line 978, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 965, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 2208, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 778, saw 3

Is `pd.read_csv` what you want? – pe-perry Jan 26 '18 at 06:22 — pe-perry, Jan 26 '18 at 06:22

score 0 · Accepted Answer · answered Jan 26 '18 at 06:42

0

you can use sys.argv[i] to parse an argument from commad line at index i Try this:

import sys

file_to_open=sys.argv[1] # 0 is the index of name of python prog, 1 is the index of first argument in command line.

df=pd.read_csv(file_to_open,sep=',')
#do whatever you want with the file

pd.to_csv() is used to save a pandas dataframe as a csv file

if you pass python finalauto.py xyz.txt in the command line, sys.argv[0] will give you finalauto.py and sys.argv[1] will give you xyz.txt

answered Jan 26 '18 at 06:42

Pratik Kumar

2,211
1
17
41

I change the code as you say, but I got error like this file_to_open=sys.argv[1] IndexError: list index out of range – Katarina Alves Jan 26 '18 at 07:10
is this correct? file_to_open=sys.argv[1] df=pd.read_csv(file_to_open) pd.to_csv(ResultDir + "/srciplist-" + ToDay + ".csv", sep=',') – Katarina Alves Jan 26 '18 at 07:21
Did you type `python finalauto.py xyz.txt` to run the program? – Pratik Kumar Jan 26 '18 at 07:23
`pd.to_csv(ResultDir + "/srciplist-" + ToDay + ".csv", sep=',')` change this to df.to_csv(), because pd is an instance of pandas library and `df` is the instance of your dataframe. – Pratik Kumar Jan 26 '18 at 07:29
I did as you say but it displays error and I updated the error..can you check it please – Katarina Alves Jan 26 '18 at 08:26
What I suspect from the error log is, your data has unequal number of column entries. For example in some row it may have 3 comma separated values and in some it may have 4 or 2. – Pratik Kumar Jan 26 '18 at 08:57
For such problem refer [this](https://stackoverflow.com/questions/15242746/handling-variable-number-of-columns-with-pandas-python) link – Pratik Kumar Jan 26 '18 at 09:03
In the end, lets say you want to drop columns **A** and **B** do it by `df.drop(['A','B'],axis=1,inplace=True)` – Pratik Kumar Jan 26 '18 at 09:12
Actually my file is not log file, it is a txt file with full of ip address in list order, so there is no column, just to scan ip addresses – Katarina Alves Jan 26 '18 at 11:20
As indicated by the last line of your error log `Expected 1 fields in line 778, saw 3` check the data at row ***778*** or rows nearby in your text file it may be in the format `number1,number2,number3` i.e 3 numbers separated by commas. That is causing the problem. Also check for such conditions elsewhere in the dataset – Pratik Kumar Jan 26 '18 at 12:03

Tynan J · Answer 2 · 2018-01-26T08:15:19.093

0

1) There is a module within the standard library called CSV. It is probably better to use that when creating CSV's. Used like this:

import csv

with open("file.csv", 'w') as f:
    writer = csv.writer(f, delimiter=',')
    writer.writerow(ResultDir + "/srciplist-" + ToDay + ".csv")

2) Here is some code for opening a file in the command line:

import sys
with open(sys.argv[1], 'r') as f:
    contents = f.read()

# Continue code below

edited Jan 26 '18 at 08:15

answered Jan 26 '18 at 06:43

Tynan J

69
6

I change the code as you say, but I got error like this TypeError: argument 1 must have a "write" method – Katarina Alves Jan 26 '18 at 07:12
Changed the code for **1** a bit. See if that helps – Tynan J Jan 26 '18 at 08:15
no error but And if open the file in csv format it shows like this "/ U s e r s / m a n i / D e" and there is no result – Katarina Alves Jan 26 '18 at 09:01

What is the correct way to read txt file using command line in Pandas

2 Answers2