-1

I want to add a new column in the dataframe. The new column is depend on some rules.

This is my code:

#!/usr/bin/python3.6
# coding=utf-8

import sys
import pandas as pd
import numpy as np
import io
import csv


df = pd.read_csv(sys.stdin,sep=',',encoding='utf-8',engine="python")

col_0 = check
df['df_cal'] = df.groupby(col_0)[col_0].transform('count') 
df['status'] = np.where(
                    df['df_cal'] > 1,'change',
                    'New')

df = df.drop_duplicates(
        subset=df.columns.difference(['keep']),keep = False)
df = df[(df.keep == '2')]
df.drop(['keep','df_cal'],axis = 1,inplace = True)

# print(sys.stdin)
df.to_csv(sys.stdout,encoding='utf-8',index = None)

sample csv:

VIP_number,keep
ab1,1
ab1,2
ab2,2
ab3,1

when I try to run this code, I write the command like this:

python3.6 nifi_python.py < test.csv check = VIP_number

and I get the error:

name 'check' is not defined

This is still not work because I don't know how can I input the column name to col_0 by stdin. col_0 should be 'VIP_number'. I don't want to hardcode the column name because the script will use in next time but the columns are different.

How can I add a new column in the dataframe by stdin? Any help would be very much appreciated.

hang
  • 25
  • 6
  • Well, yes, of course, because `check` is not defined anywhere before you try to use it here: `col_0 = check`. Why did you think *it would be defined*? – juanpa.arrivillaga Jul 26 '21 at 05:12
  • You basically seem to be asking how to accept command line arguments to your python script? In which case, this question has nothing really to do with pandas or standard input... – juanpa.arrivillaga Jul 26 '21 at 05:15
  • Were you trying to pass "check" as a command line parameter? I can show you how to do that. – Tim Roberts Jul 26 '21 at 05:16
  • 2
    @TimRoberts I believe that is probably the case. But in that case, it is certainly a duplicate – juanpa.arrivillaga Jul 26 '21 at 05:16
  • `https://stackoverflow.com/questions/16048237/pass-variable-between-python-scripts/16048264`: please check. I think this is helpful to you. – Forest 1 Jul 26 '21 at 05:19

1 Answers1

1
#!/usr/bin/python3.6
# coding=utf-8

import sys
import pandas as pd
import numpy as np
import io
import csv

if len(sys.argv) < 2:
    print( "Usage:  nifi_python.py check=<column>"
    sys.exit(1)

df = pd.read_csv(sys.stdin,sep=',',encoding='utf-8',engine="python")

col_0 = sys.argv[1].split('=')[1]

...
python nifi_python.py check=VIP_number < test.csv
Tim Roberts
  • 48,973
  • 4
  • 21
  • 30