Splitting a list inside a list

Question

I have a list of this format:

["['05-Aug-13 10:17', '05-Aug-13 10:17', '05-Aug-13 15:17']"]

I am using:

for d in date:
    print d

This produces:

['05-Aug-13 10:17', '05-Aug-13 10:17', '05-Aug-13 15:17']

I then try and add this to a defaultdict, so underneath the print d I write:

myDict[text].append(date)

Later, I try and iterate through this by using:

for text, date in myDict.iteritems():
    for d in date:
        print d, '\n'

But this doesn't work, just producing the format show in the first line of code in this question. The format suggests a list in a list, so I tried using:

for d in date:
    for e in d:
        myDict[text].append(e)

But this included every character of the line, as opposed to each separate date. What am I doing wrong? How can I have a defaultdict with this format

text : ['05-Aug-13 10:17', '06-Aug-13 11:17']

whose values can all be reached?

why does your input look like that? instead of an actual list? — Joran Beasley, Sep 03 '13 at 01:55
It's the result of defaultdict being written to a csv file and read back in again — Andrew Martin, Sep 03 '13 at 02:09
why not use pickle instead of csv? then you get actual python objects back — Joran Beasley, Sep 03 '13 at 03:27
I was originally trying to do that and got advised to use csv. I couldn't get it to work. This was my previous question: http://stackoverflow.com/questions/18580321/loading-a-defaultdict-in-hadoop-using-pickle-and-sys-stdin — Andrew Martin, Sep 03 '13 at 03:28
I'd prefer to use pickle, but have no idea how to load it into Hadoop — Andrew Martin, Sep 03 '13 at 03:29
yeah I dont know much about hadoop ... if its really large datasets then pickle may not be ideal ... but evaling it does not seem ideal either — Joran Beasley, Sep 03 '13 at 03:32
With pickle I just couldn't load that file. That's the annoying bit. Outside of Hadoop the map/reduce runs perfectly, but I can't actually load the file with all the text into Hadoop. Have tried so many variations on loading pickle, but to no avail — Andrew Martin, Sep 03 '13 at 03:33
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/36692/discussion-between-joran-beasley-and-andrew-martin) — Joran Beasley, Sep 03 '13 at 03:34
Although I don't know how to fix it, I'm guessing the problem with the reducer is that the mappers are sending their results to it at different times. The code combines key and values assuming it gets them all in order. If it got fifty keys from one place, then 10 from another, then another 10 from the first place, that second batch of 10 would be extra and wouldn't be combined. Do you reckon that could be it? — Andrew Martin, Sep 03 '13 at 04:51

score 3 · Accepted Answer · answered Sep 03 '13 at 01:44

3

Your list contains only one element: the string representation of another list. You will have to parse it into an actual list before treating it like one:

import ast

actual_list = ast.literal_eval(your_list[0])

answered Sep 03 '13 at 01:44

Blender

289,723
53
439
496

This is perfect, thank you so much. Now I'm just trying to get Hadoop to work properly. Thanks! – Andrew Martin Sep 03 '13 at 02:08

score 0 · Answer 2 · answered Sep 03 '13 at 02:37

0

As an alternative (though the regular expression might need tuning for your use)

import re
pattern = r"\d{2}-[A-Z][a-z]{2}-\d{1,2} \d{2}:\d{2}"
re.findall(ptrn, your_list[0])

answered Sep 03 '13 at 02:37

colcarroll

3,632
17
25

Splitting a list inside a list

2 Answers2