0

I removed all the null values and numbers from my data. I only have list of lists containing text strings and '|'. I want to loop over my RDD object and replace the '|' with '' or even remove it.

I tried using the map function and then I linked it to an external function

def fun(item):
    newlist=list()
    for i in item:
        if '|' == i or '|' in i:
            j=''
            newlist.append(j)
        else:
            newlist.append(i)
    return newlist

final=orginial.map(x : fun(x))

input: [['Hello','|'],..]

expected output: [['Hello',''],..]

actual output: [['Hello','|'],..]
pault
  • 41,343
  • 15
  • 107
  • 149

1 Answers1

0

you can use replace in python.

a = "ABCD|EFG"
a = a.replace("|", "")

i change the code you can use this:

def fun(item):
    newlist=list()
    for i in item:
        newlist.append(i.replace("|",""))
    return newlist

if you want to get rid of the empty strings you could also try this

output = []

for single_list in list_of_lists:
    new_in_list = [i for i in single_list if not i is "|"]
    output.append(new_in_list)

i add more example :

a = ["hello|||", "he||oagain", "|this is |", "how many ||||||||| ?"]
output = []
for i in a:
    output.append(i.replace("|", ""))
print(output)

at the end output is :

['hello', 'heoagain', 'this is ', 'how many  ?']
RezaOptic
  • 310
  • 1
  • 4
  • 16
  • In my data, I have 4 to 5 values that contain " | ". I just need to find them and change them into " " values. Do you have any idea for that? – Rahul Kiran May 25 '19 at 15:36
  • @RahulKiran i add more example for you, you can use `replace` for any values. – RezaOptic May 26 '19 at 05:42