19

When I use the following:

import pandas as pd
data = pd.read_csv('C:/Users/Z/OneDrive/Python/Exploratory Data/Aramark/ARMK.csv')
x = data.iloc[:,2]
y = pd.unique(x)
y.to_csv('yah.csv')

I get the following error:

AttributeError: 'numpy.ndarray' object has no attribute 'to_csv'
Fabio Lamanna
  • 20,504
  • 24
  • 90
  • 122
ZJAY
  • 2,517
  • 9
  • 32
  • 51
  • 1
    `to_csv` is a [pandas.DataFrame method](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html), not a numpy array method. Turn your data into a pandas DataFrame or use the appropriate numpy method. Maybe [this'll be helpful](https://stackoverflow.com/questions/6081008/dump-a-numpy-array-into-a-csv-file) – jDo Mar 19 '16 at 21:14
  • By using pd.read_csv, isn't this a panas.DataFrame? – ZJAY Mar 19 '16 at 21:16
  • 1
    Shucks, you beat me to it, @TadhgMcDonald-Jensen! :P – jDo Mar 19 '16 at 21:16
  • 2
    `pd.unique()` returns a `numpy.ndarray`, as the error very obviously states. `print(type(y))` – Tadhg McDonald-Jensen Mar 19 '16 at 21:17
  • @ZJAY apparently not (I trust the interpreter). You can always call `type(your_variable)` at various points in your script to see the datatypes – jDo Mar 19 '16 at 21:22
  • @TadhgMcDonald-Jensen Ok, i'll just shut up now :D – jDo Mar 19 '16 at 21:22
  • What is the pandas equivalent for pd.unqiue()? I'm basically trying to the dump the unique values of a column into an excel file. What is the solution? – ZJAY Mar 19 '16 at 21:31

2 Answers2

19

IIUC, starting from a dataframe:

df = pd.DataFrame({'a':[1,2,3,4,5,6],'b':['a','a','b','c','c','b']})

you can get the unique values of a column with:

g = df['b'].unique()

that returns an array:

array(['a', 'b', 'c'], dtype=object)

to save it into a .csv file I would transform it into a Series s:

In [22]: s = pd.Series(g)

In [23]: s
Out[23]: 
0    a
1    b
2    c
dtype: object

So you can easily save it:

In [24]: s.to_csv('file.csv')

Hope that helps.

Fabio Lamanna
  • 20,504
  • 24
  • 90
  • 122
6

The pandas equivalent of np.unique is the drop_duplicates method.

In [42]: x = pd.Series([1,2,1,3,2])

In [43]: y = x.drop_duplicates()

In [46]: y
Out[46]: 
0    1
1    2
3    3
dtype: int64

Notice that drop_duplicates returns a Series, so you can call its to_csv method:

import pandas as pd
data = pd.read_csv('C:/Users/Z/OneDrive/Python/Exploratory Data/Aramark/ARMK.csv')
x = data.iloc[:,2]
y = x.drop_duplicates()
y.to_csv('yah.csv')
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677