-3
type(df['Soft_skills'][0])
>>>str

I need to output like

df['Soft_skills'][0] = Management,Decision Making

and for second row

df['Soft_skills'][1] = None

I don't know how can I remove " and covert it into strformat.

>>> df['Soft_skills']
0                       ["Management", "Decision Making"]
1                                                      []
2                                          ["Management"]
3                                                      []
4       ["Governance", "Management", "Leadership", "Te...
                              ...
1229                                                   []
1230                                                   []
1231                                                   []
1232                   ["Agenda (Meeting)", "Governance"]
1233                                                   []
Name: Soft_skills, Length: 1234, dtype: object

In some cases data is The syllabus for this course will cover the following:, \n, *, The nature and purpose of cost and management accounting, \n, *, Source documents and coding, \n, *, Cost classification and measuring, \n, *, Recording costs, \n, *, Spreadsheets I replace this by using

d = {
'Not Mentioned':'',
"\r\n": "\n",
"\r": "\n",
'\u00a0':' ',
': \n, *,  ':'\n * ',
' \n,':'\n',
}
df=df.replace(d.keys(),d.values(),regex=True)

but nothing replaces what is the problem when I'm trying is there anything I missing? I used also this

df['Course_content'] = df['Course_content']\
    .str.replace('Not Mentioned','')\
    .str.replace("\r\n", "\n")\
    .str.replace("\r", "\n")\
    .str.replace('\u00a0',' ')\
    .str.replace(', \n, *,  ','\n * ')\
    .str.replace(' \n,','\n')

but it also not working for me

1 Answers1

1

Try via strip() and replace():

df['Soft_skills']=(df['Soft_skills'].str.strip("[]")
              .str.replace("'",'')
              .replace('',float('nan'),regex=True))

update:

firstly created a dictionary:

d={
    'Â':'',
    '’':"'",
    '“':'"',
    '–':'-',
    'â€':'"'
}

Finally use replace() method:

df=df.replace(d.keys(),d.values(),regex=True)

Source: I created the dictionary from this answer as that was is for php but with same encoding problem

Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41
  • 1
    Is there a reason to use this vs. `df['Soft_skills'].apply(','.join)`? Seems unnecessarily complicated, but maybe there are performance reasons. – erip May 25 '21 at 13:22
  • yup due to performance reason I didn't use `apply()`...btw added both solutions – Anurag Dabas May 25 '21 at 13:26
  • first one is not working for me it joining all words like ``` [,",M,a,n,a,g,e,m,e,n,t,",,, ,",D,e,c,i,s,i,o,... 1 [,] 2 [,",M,a,n,a,g,e,m,e,n,t,",] 3 [,] ``` – Viren Ramani May 25 '21 at 14:58
  • it was because column `'Soft_skills'` is of type string so use 2nd method...updated answer kindly have a look.... **:)** – Anurag Dabas May 25 '21 at 15:11
  • `5–8 hours per week` that type some character in my csv file how can encoding or any other would be easy for me? – Viren Ramani May 25 '21 at 16:30
  • what is the easy way to clean and replace that typo characters? – Viren Ramani May 25 '21 at 16:33
  • yes you can replace them with their actual values...Updated answer kindlly have a look **:)** – Anurag Dabas May 25 '21 at 17:07
  • I use `df.to_csv('sample.csv',encoding='utf-8-sig',index=False)` I encode and it working for me but can you look at the updated question when I'm trying to replace but it not working for `\n`,`\r\n`. – Viren Ramani May 26 '21 at 02:55
  • `df['Course_content'] = df['Course_content']\ .str.replace('Not Mentioned','')\ .str.replace("\r\n", "\n")\ .str.replace("\r", "\n")\ .str.replace('\u00a0',' ')\ .str.replace(', \n, *, ','\n * ')\ .str.replace(' \n,','\n')` I'd have to replace this character and it's not working for me.. – Viren Ramani May 26 '21 at 03:03
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/232887/discussion-between-viren-ramani-and-anurag-dabas). – Viren Ramani May 26 '21 at 03:06