0

I have a dataframe in which some rows have newline characters. I want them to get separated in separated rows.

Here is a sample of my dataframe:

   col1     col2
    a       123
    b       234
    c\nd\ne 345
    f       456
    g       567

I want to make it like this:

col1 col2
a    123
b    234
c    345
d    345
e    345
f    456
g    567

Can anybody help me?

Alex Waygood
  • 6,304
  • 3
  • 24
  • 46
  • 1
    Welcome to Stack Overflow! Please include a _small_ subset of your data as a __copyable__ piece of code that can be used for testing as well as your expected output for the __provided__ data. See [MRE - Minimal, Reproducible, Example](https://stackoverflow.com/help/minimal-reproducible-example), and [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/15497888) for more information. – Henry Ecker Sep 06 '21 at 16:44
  • 1
    Please post your data as code and not as screenshots. For example, assuming you read your excel/csv to a dataframe called `df`, include the output of `df.to_dict()` – not_speshal Sep 06 '21 at 16:44

1 Answers1

1

this code should help it can split any column of dataframe by any character you want and return splited dataframe.

class Seperator:
    row_list=[]
    def __init__(self,df,column_name,split_char):
        df.apply(lambda row:self.seperate(row,column_name,split_char),axis=1)
            

    def seperate(self,row,column_name,split_char):
        items = row[column_name].split(split_char)
        row_dic = dict(row)
        for item in items:
            row_dic[column_name] = item
            tmp = {key:row_dic[key] for key in row_dic}
            self.row_list.append(tmp)
        return row
    def dataframe(self):
        return pd.DataFrame(self.row_list)

now let's use this class:

df = pd.DataFrame({'col1':['a','b','c\nd\ne','f','g'],'col2':[123,234,345,456,567]})
df
col1    col2
0   a   123
1   b   234
2   c\nd\ne 345
3   f   456
4   g   567

after that :

seperator = Seperator(df,column_name='col1',split_char='\n')
seperator.dataframe()

col1    col2
0   a   123
1   b   234
2   c   345
3   d   345
4   e   345
5   f   456
6   g   567

enjoy.

Ali Crash
  • 458
  • 2
  • 5
  • 15