1

I have a CSV file with the following columns of interest

fields = ['column_0', 'column_1', 'column_2', 'column_3', 'column_4', 'column_5', 'column_6', 'column_7', 'column_8', 'column_9']

for each of these columns, there are 153 lines of data, containing only two values: -1 or +1

My problem is that, for each column, I would like to save the frequencies of each -1 and +1 values in comma-separated style line by line in a CSV file. I have the following problems when I do the following:

>>>df = pd.read_csv('data.csv', skipinitialspace=True, usecols=fields)
>>>print df['column_2'].value_counts()
     1    148
    -1      5
>>>df['column_2'].value_counts().to_csv('result.txt', index=False )

Then, when I open results.txt, here is what I found

148

5

Which is obviously what I dont want, I want the values in the same line of the text file separated by comma (e.g., 148, 5).

The second problem I have happens when one of the frequencies are zero,

>>> print df['column_9'].value_counts()
      1    153
>>> df['column_9'].value_counts().to_csv('result.txt', index=False )

Then, when I open results.txt, here is what I found

153

I also dont want that behavior, I would like to see 153, 0

So, in summary, I would like to know how to do that with Pandas

  1. Given one column, save its different values frequencies in the same line of a csv file and separated by commas. For example:

148,5

  1. If there is a value with frequency 0, put that in the CSV. For example:

153,0

  1. Append these frequency values in different lines of the same CSV file. For example:

148,5

153,0

Can I do that with pandas? or should I move to other python lib?

Community
  • 1
  • 1
mad
  • 2,677
  • 8
  • 35
  • 78

3 Answers3

2

Example with some dummy data:

import pandas as pd

df = pd.DataFrame({'col1': [1, 1, 1, -1, -1, -1],
                   'col2': [1, 1, 1, 1, 1, 1],
                   'col3': [-1, 1, -1, 1, -1, -1]})

counts = df.apply(pd.Series.value_counts).fillna(0).T

print(counts)

Output:

       -1    1
col1  3.0  3.0
col2  0.0  6.0
col3  4.0  2.0

You can then export this to csv.

See this answer for ref: How to get value counts for multiple columns at once in Pandas DataFrame?

Dan
  • 1,575
  • 1
  • 11
  • 17
1

I believe you could do what you want like this

import io
import pandas as pd

df = pd.DataFrame({'column_1': [1,-1,1], 'column_2': [1,1,1]})

with io.StringIO() as stream:
    # it's easier to transpose a dataframe so that the number of rows become columns
    # .to_frame to DataFrame and .T to transpose
    df['column_1'].value_counts().to_frame().T.to_csv(stream, index=False)

    print(stream.getvalue()) # check the csv data

But I would suggest something like this since you would have to otherwise specify that one of the expected values were missing

with io.StringIO() as stream:
    # it's easier to transpose a dataframe so that the number of rows become columns
    # .to_frame to DataFrame and .T to transpose
    counts = df[['column_1', 'column_2']].apply(lambda column: column.value_counts())
    counts = counts.fillna(0)
    counts.T.to_csv(stream, index=False)

    print(stream.getvalue()) # check the csv data

Buckeye14Guy
  • 831
  • 6
  • 12
1

Here is an example with three columns c1, c2, c3 and data frame d which is defined before the function is invoked.

import pandas as pd
import collections

def wcsv(d):
    dc=[dict(collections.Counter(d[i]))  for i in d.columns]
    for i in dc:
        if -1 not in list(i.keys()):
          i[-1]=0
        if 1 not in list(i.keys()):
          i[1]=0

    w=pd.DataFrame([ list(j.values()) for j in dc],columns=['1','-1'],index=['c1','c2','c3'])
    w.to_csv("t.csv")

d=pd.DataFrame([[1,1,-1],[-1,1,1],[1,1,-1],[1,1,-1]],columns=['c1','c2','c3'])
wcsv(d)
user17144
  • 428
  • 3
  • 18