1

Doing some data cleaning in a CSV file. I want to convert some CSV data into HTML before uploading the data to a website.

I'm going through every cell in the column called 'Details' in a pandas dataframe.

If a cell starts with this character combination: \r\r\n \t, then I want to replace it with this: <ul><li>

df2 = df.copy()

def startswith_replace (x, a, b):

    if x.startswith(a):
        x.replace(a, b)       

df2['Details'] = df2['Details'].
      apply(lambda x: startswith_replace(x, '\\r\\r\\n \\t', '\<ul\>\<li\>'))

When I run this, however, every cell in the 'Details' column is replaced with 'None' as its value.

Robin Duong
  • 61
  • 1
  • 10
  • 2
    Your function doesn't have a `return` statement and therefore returns `None` implicitly, see [return, return None, and no return at all?](https://stackoverflow.com/questions/15300550/return-return-none-and-no-return-at-all) – G. Anderson Mar 18 '21 at 18:52

1 Answers1

2

This can be accomplished using the built-in Series.str.replace without needing to define your own function, with just a little regex

(^ to only check the start of the string and () optionally to set it as a capture group, but if you decide you want to replace all occurrences both can be omitted and the raw string passed)

df

    A   B   A   Details
0   1   2   3   \r\r\n \t
1   4   5   6   lkjn \r\r\n \t
2   7   8   9   abcdefg

df['Details']=df['Details'].str.replace(r'^(\r\r\n \t)','\<ul\>\<li\>')

    A   B   A   Details
0   1   2   3   \<ul\>\<li\>
1   4   5   6   lkjn \r\r\n \t
2   7   8   9   abcdefg
G. Anderson
  • 5,815
  • 2
  • 14
  • 21