How to clean images format in a pandas dataframe?

Question

I have fetched some images and stores in a dataframe column-df['images']

Currently the images are fetched in following formats-

['image1.jpg','image2.jpg','','','']

Now I need to remove bracket and '' from this as below-

image1.jpg,image2.jpg

I tried with below function but it is not working-

def clean_images(imagearray):
for ch in ['[', ']',', ''',', ,']:
    if ch in imagearray:
        imagearray = string.replace(ch, "")
        print(imagearray)
return imagearray

My dataframe look like below-

Can anyone share me the right way to achieve this?

Following is the content of df.head().to_dict()-

{'Available': {0: 33, 1: 22, 2: 12, 3: 12, 4: 11}, 'Images': {0: ['https://example.com/e1e619ab5f11ffe311db03eefad5a2f4.jpg', 'https://example.com/7edc2e3cda8b63591bfacda9e254ad08.jpg', 'https://example.com/7ed2b44335f73cabe0411819820e4d0b.jpg', 'https://example.com/82fed0e56c531cde2fcf5b98f7418a6a.jpg', 'https://example.com/f536c423a97d0c9ab8c488a453818780.jpg', '', '', ''], 1: ['https://example.com/7d63597ae7a75b8481d9d4318951d6c1.jpg', '', '', '', '', '', '', ''], 2: ['https://example.com/7476c30281056d6810787c617fb4f30e.jpg', 'https://example.com/d59266704fa3f9750c02ea79956acf1e.jpg', '', '', '', '', '', ''], 3: ['https://example.com/7476c30281056d6810787c617fb4f30e.jpg', 'https://example.com/af285804c936cd3278cb2982b6f7a089.jpg', '', '', '', '', '', ''], 4: ['https://example.com/e4b6927a6bf8ad48394534c657ea0994.jpg', 'https://example.com/e630996c631e35013be0fbe0c0113fc5.jpg', '', '', '', '', '', '']}, 'SellerSku': {0: 'SCF285/01', 1: 'Munchkin Multi Forks and Spoons set', 2: 'TR0324-GB01-Fairy', 3: 'TR0323-GB01-Police Car', 4: 'DKLAN 24 -Off White'}, 'ShopSku': {0: '235588426_SGAMZ-361374143', 1: '234623934_SGAMZ-359543733', 2: '235653608_SGAMZ-361464759', 3: '235653608_SGAMZ-361464758', 4: '234907012_SGAMZ-359972591'}, 'SkuId': {0: 361374143, 1: 359543733, 2: 361464759, 3: 361464758, 4: 359972591}, 'Status': {0: 'active', 1: 'active', 2: 'active', 3: 'active', 4: 'active'}, 'Url': {0: 'https://example.com/-i235588426-s361374143.html', 1: 'https://example.com/-i234623934-s359543733.html', 2: 'https://example.com/-i235653608-s361464759.html', 3: 'https://example.com/-i235653608-s361464758.html', 4: 'https://example.com/-i234907012-s359972591.html'}, '_compatible_variation_': {0: 'SCF285/01', 1: 'Multicolor', 2: 'Fairy', 3: 'Police Car', 4: 'Off White'}, 'color_family': {0: 'SCF285/01', 1: 'Multicolor', 2: 'Fairy', 3: 'Police Car', 4: 'Off White'}, 'color_thumbnail': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, 'package_content': {0: 'Avent 3 in 1 electric steam sterilizer x1', 1: 'Multi Forks and Spoons x1', 2: 'Trunki kid suitcase luggage x1', 3: 'Trunki kid suitcase luggage x1', 4: 'DKLAN 24 Bicycle x1'}, 'package_height': {0: '1', 1: '1', 2: '13', 3: '13', 4: '1'}, 'package_length': {0: '1', 1: '1', 2: '12', 3: '12', 4: '1'}, 'package_weight': {0: '1', 1: '999', 2: '1', 3: '1', 4: '1000'}, 'package_width': {0: '11', 1: '1', 2: '11', 3: '11', 4: '1'}, 'price': {0: 109.0, 1: 8.9, 2: 80.91, 3: 80.91, 4: 178.0}, 'quantity': {0: 33, 1: 22, 2: 12, 3: 12, 4: 11}, 'special_from_date': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, 'special_from_time': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, 'special_price': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0}, 'special_time_format': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, 'special_to_date': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, 'special_to_time': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}}

Try this : `df['images'].str.replace("'",'').str.strip('[],')` — Bharath M Shetty, May 19 '18 at 09:15
@Dark it gives me AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas — AnalyticsPy, May 19 '18 at 11:56
@AnalyticsPy casting might help `df['images'].astype(str).str.replace.....` — Bharath M Shetty, May 20 '18 at 02:56

jpp · Accepted Answer · 2018-05-19T17:05:44.723

You can do this fairly cleanly with ast.literal_eval and os.path.basename:

import os
from ast import literal_eval

def formatter(x):
    return ','.join(list(filter(None, map(os.path.basename, x))))

res = s.apply(literal_eval).apply(formatter)

print(res)

0    img1.jpg,img2.jpg
1    img3.jpg,img4.jpg
2    img5.jpg,img6.jpg
dtype: object

Setup

s = pd.Series(["['http://www.test.com/img1.jpg','http://www.test.com/img2.jpg','','','']",
               "['http://www.test.com/img3.jpg','http://www.test.com/img4.jpg','','','','']",
               "['http://www.test.com/img5.jpg','http://www.test.com/img6.jpg','','','','','']"])

Updated Example

import os, pandas as pd

d = {'Available': {0: 33, 1: 22, 2: 12, 3: 12, 4: 11}, 'Images': {0: ['https://example.com/e1e619ab5f11ffe311db03eefad5a2f4.jpg', 'https://example.com/7edc2e3cda8b63591bfacda9e254ad08.jpg', 'https://example.com/7ed2b44335f73cabe0411819820e4d0b.jpg', 'https://example.com/82fed0e56c531cde2fcf5b98f7418a6a.jpg', 'https://example.com/f536c423a97d0c9ab8c488a453818780.jpg', '', '', ''], 1: ['https://example.com/7d63597ae7a75b8481d9d4318951d6c1.jpg', '', '', '', '', '', '', ''], 2: ['https://example.com/7476c30281056d6810787c617fb4f30e.jpg', 'https://example.com/d59266704fa3f9750c02ea79956acf1e.jpg', '', '', '', '', '', ''], 3: ['https://example.com/7476c30281056d6810787c617fb4f30e.jpg', 'https://example.com/af285804c936cd3278cb2982b6f7a089.jpg', '', '', '', '', '', ''], 4: ['https://example.com/e4b6927a6bf8ad48394534c657ea0994.jpg', 'https://example.com/e630996c631e35013be0fbe0c0113fc5.jpg', '', '', '', '', '', '']}, 'SellerSku': {0: 'SCF285/01', 1: 'Munchkin Multi Forks and Spoons set', 2: 'TR0324-GB01-Fairy', 3: 'TR0323-GB01-Police Car', 4: 'DKLAN 24 -Off White'}, 'ShopSku': {0: '235588426_SGAMZ-361374143', 1: '234623934_SGAMZ-359543733', 2: '235653608_SGAMZ-361464759', 3: '235653608_SGAMZ-361464758', 4: '234907012_SGAMZ-359972591'}, 'SkuId': {0: 361374143, 1: 359543733, 2: 361464759, 3: 361464758, 4: 359972591}, 'Status': {0: 'active', 1: 'active', 2: 'active', 3: 'active', 4: 'active'}, 'Url': {0: 'https://example.com/-i235588426-s361374143.html', 1: 'https://example.com/-i234623934-s359543733.html', 2: 'https://example.com/-i235653608-s361464759.html', 3: 'https://example.com/-i235653608-s361464758.html', 4: 'https://example.com/-i234907012-s359972591.html'}, '_compatible_variation_': {0: 'SCF285/01', 1: 'Multicolor', 2: 'Fairy', 3: 'Police Car', 4: 'Off White'}, 'color_family': {0: 'SCF285/01', 1: 'Multicolor', 2: 'Fairy', 3: 'Police Car', 4: 'Off White'}, 'package_content': {0: 'Avent 3 in 1 electric steam sterilizer x1', 1: 'Multi Forks and Spoons x1', 2: 'Trunki kid suitcase luggage x1', 3: 'Trunki kid suitcase luggage x1', 4: 'DKLAN 24 Bicycle x1'}, 'package_height': {0: '1', 1: '1', 2: '13', 3: '13', 4: '1'}, 'package_length': {0: '1', 1: '1', 2: '12', 3: '12', 4: '1'}, 'package_weight': {0: '1', 1: '999', 2: '1', 3: '1', 4: '1000'}, 'package_width': {0: '11', 1: '1', 2: '11', 3: '11', 4: '1'}, 'price': {0: 109.0, 1: 8.9, 2: 80.91, 3: 80.91, 4: 178.0}, 'quantity': {0: 33, 1: 22, 2: 12, 3: 12, 4: 11}, 'special_price': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0}}

df = pd.DataFrame.from_dict(d)

def formatter(x):
    return ','.join(list(filter(None, map(os.path.basename, x))))

df['Images'] = df['Images'].apply(formatter)

print(df['Images'])

0    e1e619ab5f11ffe311db03eefad5a2f4.jpg,7edc2e3cd...
1                 7d63597ae7a75b8481d9d4318951d6c1.jpg
2    7476c30281056d6810787c617fb4f30e.jpg,d59266704...
3    7476c30281056d6810787c617fb4f30e.jpg,af285804c...
4    e4b6927a6bf8ad48394534c657ea0994.jpg,e630996c6...
Name: Images, dtype: object

When I used your code it gives me following error- ValueError: malformed node or string And besides it according to this we should not use ast on json data- https://stackoverflow.com/questions/32695699/valueerror-malformed-string-using-ast-literal-eval — AnalyticsPy, May 19 '18 at 14:19
@AnalyticsPy, I'm confused. Where did anyone mention json? I've also provided a complete example in line with the extract you provided (an image, unfortunately). Seems like your actual data is different; it might be helpful if you could [edit](https://stackoverflow.com/posts/50423502/edit) with representative data. — jpp, May 19 '18 at 14:39
My bad @jpp. If you please try with exact values of column Images as in screenshot it will be good because in your setup data is different as I have given in screenshot of df. — AnalyticsPy, May 19 '18 at 15:00
@AnalyticsPy, I've updated. But (not surprisingly) it still works. — jpp, May 19 '18 at 15:03
ValueError: malformed node or string: ['https://www.zbc.com/e1e619ab5f11ffe311db03eefad5a2f4.jpg', 'https://www.zbc.com/7edc2e3cda8b63591bfacda9e254ad08.jpg', 'https://www.zbc.com/7ed2b44335f73cabe0411819820e4d0b.jpg', 'www.zbc.com/82fed0e56c531cde2fcf5b98f7418a6a.jpg', 'https://www.zbc.com/f536c423a97d0c9ab8c488a453818780.jpg', '', '', ''] — AnalyticsPy, May 19 '18 at 15:05
Sorry, I need more information [editted](https://stackoverflow.com/posts/50423502/edit) into your question to help! Have a look at [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). A screenshot usually is *not* sufficient. — jpp, May 19 '18 at 15:07
Totally agree with you @jpp. sometimes exact dataframe is needed for solving the issue. But I am afraid how can I share the full dataframe with you. — AnalyticsPy, May 19 '18 at 15:10
@AnalyticsPy, For starters, try my solution with the first 20 rows of your dataframe. If you still get an error, print out `df.head().to_dict()` and edit it into your question *as text*. — jpp, May 19 '18 at 15:11
I am getting the same error so I added what you suggested. Please see once. — AnalyticsPy, May 19 '18 at 15:46
@AnalyticsPy, see updated example using your data. Note that you don't have strings at all, you have lists. So you don't even need `ast.literal_eval`. — jpp, May 19 '18 at 17:06
Thanks @jpp your updated example helped me. One more query if I need to maintain the format as -https://example.com/image1.jpg,https://example.com/image2.jpg then what changes required? — AnalyticsPy, May 19 '18 at 17:21

How to clean images format in a pandas dataframe?

1 Answers1