1

I have a list of data similar to that below:

a = ['"105', '424"', '"102', '629"', '"104', '307"']

I want this data to be in a form similar to that of below:

a = ['105424', '102629', '104307']

I am unsure of how to proceed. I thought perhaps removing all the commas then inserting commas only where they should be and then removing the quotations. I am finding this to be quite challenging.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
arnold
  • 11
  • 1
  • 2
  • 1) Search `', '` and kill. 2) Replace `'"` by `'` and `"'` by `'`'. – Kerrek SB Jul 05 '11 at 21:49
  • are you sure youre nesting the single and double quotes how you want? the first "a" is a list of 6 strings. Did you want it to be a list of 3 strings? – totowtwo Jul 05 '11 at 21:50
  • That's some very strangely formatted data you have there.. – Acorn Jul 05 '11 at 21:51
  • 3
    Where did this data come from? A CSV file? if so, why aren't you using the `csv` module? – S.Lott Jul 05 '11 at 21:52
  • Thanks a lot everyone. All of your advice was very helpful. For those of you who were interested, the data did come from a csv file where commas were separating both the column entries and the thousands. Thanks –  Jul 06 '11 at 03:06

5 Answers5

4

I'm assuming this data was originally in a csv file where data that contains commas is quoted ("105,424","102,629","104,307") and then you are splitting on comma:

>>> '"105,424","102,629","104,307"'.split(',')
['"105', '424"', '"102', '629"', '"104', '307"']

Rather you should let the csv module do the work as it will handle the double quotes:

import csv

with open('u:\\foobar.csv', 'rb') as f:
    reader = csv.reader(f)
    for row in reader:
        print [x.replace(',','') for x in row]

This prints: ['105424', '102629', '104307']

Steven Rumbalski
  • 44,786
  • 9
  • 89
  • 119
1

If the source data is CSV, you should use @steven's answer.

Regardless, here's how you could process what you pasted.

As @troutwine stated, this will only work if the number parts are always in pairs.

a = ['"105', '424"', '"102', '629"', '"104', '307"']

from itertools import izip

def pairwise(iterable):
    "s -> (s0,s1), (s2,s3), (s4, s5), ..."
    a = iter(iterable)
    return izip(a, a)

result = []

for x, y in pairwise(a):
    result.append(''.join([x, y]).strip('"'))

print result

Gives:

['105424', '102629', '104307']

Pairwise snippet from here: Iterating over every two elements in a list

Community
  • 1
  • 1
Acorn
  • 49,061
  • 27
  • 133
  • 172
1

Does your data look something like:

"123", "123,456", "123,456,789"

If so then try this

input = '"123", "123,456", "123,456,789"'

import re

reg = re.compile('"(\d{1,3}(,\d{3})*)"')

stringValues = [wholematch.replace(',', '') for wholematch, _endmatch 
                                                    in reg.findall(input)]

This regex should also work on thousands with decimal places as well.

re.compile('"(\d{1,3}(,\d{3})*(\.\d*)?)"')
Dunes
  • 37,291
  • 7
  • 81
  • 97
0

If you'll never have an unmatched pair, loop over a range 1/2 the size of the input list, mash the current index plus the next together, do a string substitution and skip to the current index plus two.

troutwine
  • 3,721
  • 3
  • 28
  • 62
0

Reduce to the rescue:

l = ['"105', '424"', '"102', '629"', '"104', '307"', '"123', '456', '789"', '"123"']

# Concatenate everything and split by ", get non-empties
l2 = [num for num in reduce(lambda x, y: x+y, l).split('"') if num != '']

# Output:
# ['105424', '102629', '104307', '123456789', '123']
print l2

Few caveats though: This code can do numbers beyond thousands (ie, 1,457,664), but also assumes that the whole number was double-quoted.

As others have said though, you should revisit your data retrieval as there are most likely ways to get the values correctly without dealing with the double-quotes. This was a fun little challenge nonetheless.

Manny D
  • 20,310
  • 2
  • 29
  • 31