1

I have fetched some images and stores in a dataframe column-df['images']

Currently the images are fetched in following formats-

['image1.jpg','image2.jpg','','','']

Now I need to remove bracket and '' from this as below-

image1.jpg,image2.jpg

I tried with below function but it is not working-

def clean_images(imagearray):
for ch in ['[', ']',', ''',', ,']:
    if ch in imagearray:
        imagearray = string.replace(ch, "")
        print(imagearray)
return imagearray

My dataframe look like below- enter image description here

Can anyone share me the right way to achieve this?

Following is the content of df.head().to_dict()-

{'Available': {0: 33, 1: 22, 2: 12, 3: 12, 4: 11}, 'Images': {0: ['https://example.com/e1e619ab5f11ffe311db03eefad5a2f4.jpg', 'https://example.com/7edc2e3cda8b63591bfacda9e254ad08.jpg', 'https://example.com/7ed2b44335f73cabe0411819820e4d0b.jpg', 'https://example.com/82fed0e56c531cde2fcf5b98f7418a6a.jpg', 'https://example.com/f536c423a97d0c9ab8c488a453818780.jpg', '', '', ''], 1: ['https://example.com/7d63597ae7a75b8481d9d4318951d6c1.jpg', '', '', '', '', '', '', ''], 2: ['https://example.com/7476c30281056d6810787c617fb4f30e.jpg', 'https://example.com/d59266704fa3f9750c02ea79956acf1e.jpg', '', '', '', '', '', ''], 3: ['https://example.com/7476c30281056d6810787c617fb4f30e.jpg', 'https://example.com/af285804c936cd3278cb2982b6f7a089.jpg', '', '', '', '', '', ''], 4: ['https://example.com/e4b6927a6bf8ad48394534c657ea0994.jpg', 'https://example.com/e630996c631e35013be0fbe0c0113fc5.jpg', '', '', '', '', '', '']}, 'SellerSku': {0: 'SCF285/01', 1: 'Munchkin Multi Forks and Spoons set', 2: 'TR0324-GB01-Fairy', 3: 'TR0323-GB01-Police Car', 4: 'DKLAN 24 -Off White'}, 'ShopSku': {0: '235588426_SGAMZ-361374143', 1: '234623934_SGAMZ-359543733', 2: '235653608_SGAMZ-361464759', 3: '235653608_SGAMZ-361464758', 4: '234907012_SGAMZ-359972591'}, 'SkuId': {0: 361374143, 1: 359543733, 2: 361464759, 3: 361464758, 4: 359972591}, 'Status': {0: 'active', 1: 'active', 2: 'active', 3: 'active', 4: 'active'}, 'Url': {0: 'https://example.com/-i235588426-s361374143.html', 1: 'https://example.com/-i234623934-s359543733.html', 2: 'https://example.com/-i235653608-s361464759.html', 3: 'https://example.com/-i235653608-s361464758.html', 4: 'https://example.com/-i234907012-s359972591.html'}, '_compatible_variation_': {0: 'SCF285/01', 1: 'Multicolor', 2: 'Fairy', 3: 'Police Car', 4: 'Off White'}, 'color_family': {0: 'SCF285/01', 1: 'Multicolor', 2: 'Fairy', 3: 'Police Car', 4: 'Off White'}, 'color_thumbnail': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, 'package_content': {0: 'Avent 3 in 1 electric steam sterilizer x1', 1: 'Multi Forks and Spoons x1', 2: 'Trunki kid suitcase luggage x1', 3: 'Trunki kid suitcase luggage x1', 4: 'DKLAN 24 Bicycle x1'}, 'package_height': {0: '1', 1: '1', 2: '13', 3: '13', 4: '1'}, 'package_length': {0: '1', 1: '1', 2: '12', 3: '12', 4: '1'}, 'package_weight': {0: '1', 1: '999', 2: '1', 3: '1', 4: '1000'}, 'package_width': {0: '11', 1: '1', 2: '11', 3: '11', 4: '1'}, 'price': {0: 109.0, 1: 8.9, 2: 80.91, 3: 80.91, 4: 178.0}, 'quantity': {0: 33, 1: 22, 2: 12, 3: 12, 4: 11}, 'special_from_date': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, 'special_from_time': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, 'special_price': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0}, 'special_time_format': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, 'special_to_date': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, 'special_to_time': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}}
jpp
  • 159,742
  • 34
  • 281
  • 339
AnalyticsPy
  • 245
  • 2
  • 14

1 Answers1

2

You can do this fairly cleanly with ast.literal_eval and os.path.basename:

import os
from ast import literal_eval

def formatter(x):
    return ','.join(list(filter(None, map(os.path.basename, x))))

res = s.apply(literal_eval).apply(formatter)

print(res)

0    img1.jpg,img2.jpg
1    img3.jpg,img4.jpg
2    img5.jpg,img6.jpg
dtype: object

Setup

s = pd.Series(["['http://www.test.com/img1.jpg','http://www.test.com/img2.jpg','','','']",
               "['http://www.test.com/img3.jpg','http://www.test.com/img4.jpg','','','','']",
               "['http://www.test.com/img5.jpg','http://www.test.com/img6.jpg','','','','','']"])

Updated Example

import os, pandas as pd

d = {'Available': {0: 33, 1: 22, 2: 12, 3: 12, 4: 11}, 'Images': {0: ['https://example.com/e1e619ab5f11ffe311db03eefad5a2f4.jpg', 'https://example.com/7edc2e3cda8b63591bfacda9e254ad08.jpg', 'https://example.com/7ed2b44335f73cabe0411819820e4d0b.jpg', 'https://example.com/82fed0e56c531cde2fcf5b98f7418a6a.jpg', 'https://example.com/f536c423a97d0c9ab8c488a453818780.jpg', '', '', ''], 1: ['https://example.com/7d63597ae7a75b8481d9d4318951d6c1.jpg', '', '', '', '', '', '', ''], 2: ['https://example.com/7476c30281056d6810787c617fb4f30e.jpg', 'https://example.com/d59266704fa3f9750c02ea79956acf1e.jpg', '', '', '', '', '', ''], 3: ['https://example.com/7476c30281056d6810787c617fb4f30e.jpg', 'https://example.com/af285804c936cd3278cb2982b6f7a089.jpg', '', '', '', '', '', ''], 4: ['https://example.com/e4b6927a6bf8ad48394534c657ea0994.jpg', 'https://example.com/e630996c631e35013be0fbe0c0113fc5.jpg', '', '', '', '', '', '']}, 'SellerSku': {0: 'SCF285/01', 1: 'Munchkin Multi Forks and Spoons set', 2: 'TR0324-GB01-Fairy', 3: 'TR0323-GB01-Police Car', 4: 'DKLAN 24 -Off White'}, 'ShopSku': {0: '235588426_SGAMZ-361374143', 1: '234623934_SGAMZ-359543733', 2: '235653608_SGAMZ-361464759', 3: '235653608_SGAMZ-361464758', 4: '234907012_SGAMZ-359972591'}, 'SkuId': {0: 361374143, 1: 359543733, 2: 361464759, 3: 361464758, 4: 359972591}, 'Status': {0: 'active', 1: 'active', 2: 'active', 3: 'active', 4: 'active'}, 'Url': {0: 'https://example.com/-i235588426-s361374143.html', 1: 'https://example.com/-i234623934-s359543733.html', 2: 'https://example.com/-i235653608-s361464759.html', 3: 'https://example.com/-i235653608-s361464758.html', 4: 'https://example.com/-i234907012-s359972591.html'}, '_compatible_variation_': {0: 'SCF285/01', 1: 'Multicolor', 2: 'Fairy', 3: 'Police Car', 4: 'Off White'}, 'color_family': {0: 'SCF285/01', 1: 'Multicolor', 2: 'Fairy', 3: 'Police Car', 4: 'Off White'}, 'package_content': {0: 'Avent 3 in 1 electric steam sterilizer x1', 1: 'Multi Forks and Spoons x1', 2: 'Trunki kid suitcase luggage x1', 3: 'Trunki kid suitcase luggage x1', 4: 'DKLAN 24 Bicycle x1'}, 'package_height': {0: '1', 1: '1', 2: '13', 3: '13', 4: '1'}, 'package_length': {0: '1', 1: '1', 2: '12', 3: '12', 4: '1'}, 'package_weight': {0: '1', 1: '999', 2: '1', 3: '1', 4: '1000'}, 'package_width': {0: '11', 1: '1', 2: '11', 3: '11', 4: '1'}, 'price': {0: 109.0, 1: 8.9, 2: 80.91, 3: 80.91, 4: 178.0}, 'quantity': {0: 33, 1: 22, 2: 12, 3: 12, 4: 11}, 'special_price': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0}}

df = pd.DataFrame.from_dict(d)

def formatter(x):
    return ','.join(list(filter(None, map(os.path.basename, x))))

df['Images'] = df['Images'].apply(formatter)

print(df['Images'])

0    e1e619ab5f11ffe311db03eefad5a2f4.jpg,7edc2e3cd...
1                 7d63597ae7a75b8481d9d4318951d6c1.jpg
2    7476c30281056d6810787c617fb4f30e.jpg,d59266704...
3    7476c30281056d6810787c617fb4f30e.jpg,af285804c...
4    e4b6927a6bf8ad48394534c657ea0994.jpg,e630996c6...
Name: Images, dtype: object
jpp
  • 159,742
  • 34
  • 281
  • 339
  • When I used your code it gives me following error- ValueError: malformed node or string And besides it according to this we should not use ast on json data- https://stackoverflow.com/questions/32695699/valueerror-malformed-string-using-ast-literal-eval – AnalyticsPy May 19 '18 at 14:19
  • @AnalyticsPy, I'm confused. Where did anyone mention json? I've also provided a complete example in line with the extract you provided (an image, unfortunately). Seems like your actual data is different; it might be helpful if you could [edit](https://stackoverflow.com/posts/50423502/edit) with representative data. – jpp May 19 '18 at 14:39
  • My bad @jpp. If you please try with exact values of column Images as in screenshot it will be good because in your setup data is different as I have given in screenshot of df. – AnalyticsPy May 19 '18 at 15:00
  • @AnalyticsPy, I've updated. But (not surprisingly) it still works. – jpp May 19 '18 at 15:03
  • ValueError: malformed node or string: ['https://www.zbc.com/e1e619ab5f11ffe311db03eefad5a2f4.jpg', 'https://www.zbc.com/7edc2e3cda8b63591bfacda9e254ad08.jpg', 'https://www.zbc.com/7ed2b44335f73cabe0411819820e4d0b.jpg', 'www.zbc.com/82fed0e56c531cde2fcf5b98f7418a6a.jpg', 'https://www.zbc.com/f536c423a97d0c9ab8c488a453818780.jpg', '', '', ''] – AnalyticsPy May 19 '18 at 15:05
  • Sorry, I need more information [editted](https://stackoverflow.com/posts/50423502/edit) into your question to help! Have a look at [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). A screenshot usually is *not* sufficient. – jpp May 19 '18 at 15:07
  • Totally agree with you @jpp. sometimes exact dataframe is needed for solving the issue. But I am afraid how can I share the full dataframe with you. – AnalyticsPy May 19 '18 at 15:10
  • 1
    @AnalyticsPy, For starters, try my solution with the first 20 rows of your dataframe. If you still get an error, print out `df.head().to_dict()` and edit it into your question *as text*. – jpp May 19 '18 at 15:11
  • I am getting the same error so I added what you suggested. Please see once. – AnalyticsPy May 19 '18 at 15:46
  • 1
    @AnalyticsPy, see updated example using your data. Note that you don't have strings at all, you have lists. So you don't even need `ast.literal_eval`. – jpp May 19 '18 at 17:06
  • Thanks @jpp your updated example helped me. One more query if I need to maintain the format as -https://example.com/image1.jpg,https://example.com/image2.jpg then what changes required? – AnalyticsPy May 19 '18 at 17:21
  • 1
    Use `','.join(list(filter(None, x)))` instead. – jpp May 19 '18 at 17:23
  • 1
    Thank a lot @jpp for your help. it really solved my issue. – AnalyticsPy May 19 '18 at 17:26