1

I am using Python 3.6 with pandas and numpy. I have two CSV files, both not containing any titles (so indexing is builtin). One is a one column list with computernames: PC001 PC002 PC003 ...

The other file is an import-file for a system. It is a csv-file. And the pc name is the third column: addprinter,terminal,PC001,something,something addprinter,terminal,PC002,something,something addprinter,terminal,PC003,something,something ...

Now the import-file contains thousands of entries and i only need the line copied to a new csv (name it to-be-imported.csv) which contains the pc names from the let's call it hostnames.csv

I came "close" by using this here:

np.intersect1d(df_main[2],df_key[0])

Unfortunately it will only list then the pc names which were found in the huge csv but not listing the line which contains the name (so it could be easily written in a new csv).

I know that is to advance for me, but i am also sure i will learn much. So hopefully there is a kind soul out there understanding what i would like to do and share some guidance

MattR
  • 4,887
  • 9
  • 40
  • 67
Mitch
  • 23
  • 4
  • Just to check I'm understanding correctly - you have two .csv files, and you want to get only the records from your large import-file where the PC name is in your hostnames.csv file, correct? – Ben Feb 06 '18 at 14:35
  • Possible duplicate of [Select rows from a DataFrame based on values in a column in pandas](https://stackoverflow.com/questions/17071871/select-rows-from-a-dataframe-based-on-values-in-a-column-in-pandas) – nnnmmm Feb 06 '18 at 14:41
  • @Ben Yes, that is correct. So in the script i do something like this: Load df1 hugedatafile.csv Load df2 hostnames.csv Check if 1st entry (which is in 1st column) from hostnames.csv matches anything in the huge data file (huge.csv - hostname is in the third column). If so, copy/write that complete line (addprinter,terminal,PC001,...,...) to the new csv (to-be-imported.csv). Proceed with comparing and copy (if it matches) until end of file. Then check the 2nd entry in hostnames.csv (1st column, 2nd row) and so on until last entry in hostnames.csv is checked if it matches. – Mitch Feb 06 '18 at 14:52

1 Answers1

2

As answered here, you could use isin:

hosts = pd.read_csv('hostnames.csv', header=None, names=['hosts'], squeeze=True)
df = pd.read_csv('import.csv', header=None, names=['a', 'b', 'host', 'c'])
result = df.loc[df['host'].isin(hosts)]

and then write the result to a CSV file with to_csv.

nnnmmm
  • 7,964
  • 4
  • 22
  • 41
  • Thank you so much! Important to notice: for the "df" line the letters must match the amount of actual columns. In my file i have 6 columns, so i had to add 'd','e' ... then it worked like a charm. Now i will proceed in output (write to csv) and then it is done. Thank you :) – Mitch Feb 06 '18 at 15:15