Python: Using Pandas, how do I choose the columns in my output?

Question

I am running my whole Active directory against user accounts trying to find what doesn't belong. Using my code my output gives me the words that only occur once in the Username column. Even though I am analyzing one column of data, I want to keep all of the columns that are with the data.

from pandas import DataFrame, read_csv
import pandas as pd  
f1 = pd.read_csv('lastlogonuser.txt', sep='\t', encoding='latin1')
f2 = pd.read_csv('UserAccounts.csv', sep=',', encoding ='latin1')
f2 = f2.rename(columns={'Shortname':'User Name'})
f = pd.concat([f1, f2])
counts = f['User Name'].value_counts()
f = counts[counts == 1] 
f

I get something like this when I run my code:

sample534         1
sample987         1
sample342         1
sample321         1
sample123         1

I would like ALL of the data from the txt files to come out in my out put, but I still want only the username column analyzed. How do I keep all of the data in all columns, or do I have to use a different word count to include all columns of data?

I would like something like:

   User Name    Description
1  sample534    Journal Mailbox managed by         
1  sample987    Journal Mailbox managed by    
1  sample342    Journal Mailbox managed by   
1  sample321    Journal Mailbox managed by 
1  sample123    Journal Mailbox managed by

Sample of data I am using:

Account User Name User CN                       Description
ENABLED MBJ29     CN=MBJ29,CN=Users             Journal Mailbox managed by  
ENABLED MBJ14     CN=MBJ14,CN=Users             Journal Mailbox managed by
ENABLED MBJ08     CN=MBJ30,CN=Users             Journal Mailbox managed by   
ENABLED MBJ07     CN=MBJ07,CN=Users             Journal Mailbox managed by

Please, don't *describe* your data. Include, in your post, a *sample* of your *actual* data. — BrenBarn, Jun 08 '16 at 18:41
[how to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — MaxU - stand with Ukraine, Jun 08 '16 at 18:42

Andreas Hsieh · Accepted Answer · 2016-06-08T19:17:07.860

1

Based on your description, I guess you want to use the counts of unique elements as index to select rows in your dataframe. Maybe you can try this:

df2 = pd.DataFrame()    
counts = f['User Name'].value_counts()
counts = counts[counts == 1].index
for index in counts:
    df2 = df2.append(f[f['User Name'] == index])

edited Jun 08 '16 at 19:17

answered Jun 08 '16 at 18:47

Andreas Hsieh

2,080
1
10
8

Udpated the answer to correctly select index of unique elements. – Andreas Hsieh Jun 08 '16 at 19:17

Python: Using Pandas, how do I choose the columns in my output?

1 Answers1