Removing space in dataframe python

Question

I am getting an error in my code because I tried to make a dataframe by calling an element from a csv. I have two columns I call from a file: CompanyName and QualityIssue. There are three types of Quality issues: Equipment Quality, User, and Neither. I run into problems trying to make a dataframe df.Equipment Quality, which obviously doesn't work because there is a space there. I want to take Equipment Quality from the original file and replace the space with an underscore.

input:

Top Calling Customers,         Equipment Quality,    User,    Neither,
Customer 3,                      2,           2,        0,
Customer 1,                      0,           2,        1,
Customer 2,                      0,           1,        0,
Customer 4,                      0,           1,        0,

Here is my code:

import numpy as np
import pandas as pd
import pandas.util.testing as tm; tm.N = 3

# Get the data.
data = pd.DataFrame.from_csv('MYDATA.csv')   
# Group the data by calling CompanyName and QualityIssue columns.
byqualityissue = data.groupby(["CompanyName", "QualityIssue"]).size() 
# Make a pandas dataframe of the grouped data.
df = pd.DataFrame(byqualityissue) 
# Change the formatting of the data to match what I want SpiderPlot to read.
formatted = df.unstack(level=-1)[0]  
# Replace NaN values with zero.
formatted[np.isnan(formatted)] = 0 
includingtotals = pd.concat([formatted,pd.DataFrame(formatted.sum(axis=1), 
                             columns=['Total'])], axis=1)
sortedtotal = includingtotals.sort_index(by=['Total'], ascending=[False])
sortedtotal.to_csv('byqualityissue.csv')

This seems to be a frequently asked question and I tried lots of the solutions but they didn't seem to work. Here is what I tried:

with open('byqualityissue.csv', 'r') as f:
    reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
    return [[x.strip() for x in row] for row in reader]
    sentence.replace(" ", "_")

And

sortedtotal['QualityIssue'] = sortedtotal['QualityIssue'].map(lambda x: x.rstrip(' '))

And what I thought was the most promising from here http://pandas.pydata.org/pandas-docs/stable/text.html:

formatted.columns = formatted.columns.str.strip().str.replace(' ', '_')

but I got this error: AttributeError: 'Index' object has no attribute 'str'

Thanks for your help in advance!

You can do `df.columns = pd.Series(df.columns).str.replace(' ','_')` as an aside why are you not just doing `df['Equipment Quality']` which will always work rather than trying to access the columns as an attribute which won't? — EdChum, Jun 10 '15 at 18:22
@EdChum Yes! Thanks for the aside, because I guess that should have been the real question rather than the work around. I wasn't sure about syntax, but that worked better. — jenryb, Jun 10 '15 at 20:35
While this is the older question, the [answer](https://stackoverflow.com/a/41476181/7758804) to [Removing space from columns in pandas](https://stackoverflow.com/q/41476150/7758804) is more appropriate for pandas, with significantly more views/votes. — Trenton McKinney, Aug 02 '21 at 22:35

score 11 · Accepted Answer · answered Jun 10 '15 at 19:35

11

Try:

formatted.columns = [x.strip().replace(' ', '_') for x in formatted.columns]

answered Jun 10 '15 at 19:35

Alexander

105,104
32
201
196

score 3 · Answer 2 · answered Nov 22 '15 at 01:13

As I understand your question, the following should work (test it out with inplace=False to see how it looks first if you want to be careful):

sortedtotal.rename(columns=lambda x: x.replace(" ", "_"), inplace=True)

And if you have white space surrounding the column names, like: "This example "

sortedtotal.rename(columns=lambda x: x.strip().replace(" ", "_"), inplace=True)

which strips leading/trailing whitespace, then converts internal spaces to "_".

Removing space in dataframe python

2 Answers2