1

I converted a pd.series into a dataframe. After conversion, one of the dataframe column does not have any name and the other one has "0" as its name. I need to give names to the column.

I tried using df.columns = ["A","B"] and rename but it does not help

import pandas as pd
import nltk
from nltk.corpus import stopwords       #for removing stopwords
import re                               #for removing numbers, special characters
#Import CSV into dataframe
filepath = "C:/a/Python/Clustering/LabeledRawDatav2.csv"
df = pd.read_csv(filepath,encoding='windows-1252')
print(df.head(2))

freq = pd.DataFrame(columns=["Word","Count"])

freq = pd.Series(' '.join(df["Notes"]).split()).value_counts()[:]
freq = pd.Series.to_frame(freq)

freq.rename(columns = {"0":"Freq"},inplace=True)

print(freq)

Expected result would be

Word                  freq
-                     206
the                    65
for                    62
1                      62
DAYS                   56

Actual result is

                        0
-                     206
the                    65
for                    62
1                      62
DAYS                   56
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
Anoop Mahajan
  • 91
  • 1
  • 10
  • are you sure that it's a "0" and not 0 (as an integer)? What do you get when you look at `freq.columns` ? – Magellan88 Jul 10 '19 at 12:27
  • It is an integer 0 and not alphabet O. If thats what you are asking... – Anoop Mahajan Jul 10 '19 at 12:31
  • Possible duplicate of [Renaming columns in pandas](https://stackoverflow.com/questions/11346283/renaming-columns-in-pandas) – Ronnie Jul 10 '19 at 12:43
  • So does `freq.rename(columns = {0:"Freq"},inplace=True)` work? (without the " " around the 0, which would make it into an integer and not a string containing 0)? I mean `0` {integer} vs `"0"` {string}. My guess is that you are renaming the string "0" to sth. else, but that does not exist, rather the integer 0. That's why I was wondering what `freq.columns` gives -- there you can see it – Magellan88 Jul 10 '19 at 12:49

2 Answers2

1

I usually do it like this:

freq = df["Notes"].str.split(expand = True).stack().value_counts().rename_axis('word').reset_index(name = 'count')

This can overcome the 0 column problem.

Credits to original author jezrael because I took it from one of his answers, cannot seem to find the original link!

Ankur Sinha
  • 6,473
  • 7
  • 42
  • 73
1

You initially have an unnamed Series built from value_counts() that you convert into a DataFrame with to_frame.

That means that the DataFrame has the words (-, the, for, ...) as index, and one single column named 0 - the integer value 0 and not the string `"0".

What you want is:

# give a name to the original Series: freq
freq = pd.Series(' '.join(df["Notes"]).split(), name='freq').value_counts()

# give a name to the index and convert to a dataframe
freq = freq.rename_axis('Word').to_frame().reset_index()
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252