0

My plan is to display an Ordered dictionary with OrderedDict of a dataframe read from any csv with or without excessive spaces (initial spaces/non initial spaces, excessive spaces(double/tripe spaces) in between words)

These are the problems I encountered while trying to get rid of excessive spaces:

  1. By using str.strip(), the string of columns are needed to be specified, which can only work with specific csv file.

  2. By specifying sep in the parameter of pd.read_csv, some of the items in the csv file will turn into 'None' in an ordered dictionary.

  3. skipinitialspace() can't remove other excessive spaces.

Any workaround for this code?:

file = input("Input any csv file or file path \nYour input: ")

df = pd.read_csv(file)

for i, row in df.iterrows():
      d = OrderedDict(zip(row.index.tolist(), row.tolist()))
      print(d)

Edit:

Additional example:

df = pd.DataFrame([[f'Ka te', f' Rose '], [f' Tim ', f'John  son'], [f'James  ', f'House  ']], columns=['First Name', 'Last Name'])

for i, row in df.iterrows():
      d = OrderedDict(zip(row.index.tolist(), row.tolist()))
      print(d)

Output:

OrderedDict([('First Name', 'Ka te'), ('Last Name', ' Rose ')])
OrderedDict([('First Name', ' Tim '), ('Last Name', 'John  son')])
OrderedDict([('First Name', 'James  '), ('Last Name', 'House  ')])

The Output that I want:

OrderedDict([('First Name', 'Kate'), ('Last Name', 'Rose')])
OrderedDict([('First Name', 'Tim'), ('Last Name', 'John son')])
OrderedDict([('First Name', 'James'), ('Last Name', 'House')])
PypypieYum
  • 37
  • 5
  • Please provide a [reproducible minimal example](https://stackoverflow.com/q/20109391/8107362). Especially, provide some [sample data](https://stackoverflow.com/q/22418895/8107362) and your expected result, e.g. with `print(df.to_dict())`. I understand that the cleaning does not need to happen when loading the data? – mnist Nov 20 '21 at 14:22
  • Thank you for the reply! I have edited the post with some additional information added, I am not sure what to do as I am still a new learner. – PypypieYum Nov 20 '21 at 15:01

1 Answers1

0

Just got the solution, it can be achieved by using str.replace():

df = pd.DataFrame([[f'Ka te', f' Rose '], [f' Tim ', f'John  son'], [f'James  ', f'House  ']], columns=['First Name', 'Last Name'])

for c in df:
  df[c] = df[c].str.replace(' ', '')

for i, row in df.iterrows():
  d = OrderedDict(zip(row.index.tolist(), row.tolist()))
  print(d)
PypypieYum
  • 37
  • 5