Reading thru CSV file, finding columns with same value and appending their other column values

Question

quick logic question.

If I have a CSV file that has a dictionary value for each line (with columns being ["Location"], ["Movie Title"], ["Date"]), what's the best way for me to combined the Title and Date values of data rows with the same Location value?

snippet of data:

Location    Movie Title Date    
Edgebrook Park, Chicago     A League of Their Own   7-Jun   
Edgebrook Park, Chicago     It's a Mad, Mad, Mad, Mad World 9-Jun

For every row that has the same location (^as in this example), i'd like to make an output like this so that there are no duplicate locations.

Edgebrook Park, Chicago     A League of Their Own   7-Jun    It's a Mad, Mad, Mad, Mad World    9-Jun

What would be the best way to go about this?

UPDATE: I've had to change the data a little bit, so now my columns are looking like:

Location,MovieDate,Formatted_Address,Lat,Lng
"Edgebrook Park, Chicago ",Jun-7 A League of Their Own,"Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672
"Gage Park, Chicago ","Jun-9 It's a Mad, Mad, Mad, Mad World","Gage Park, Chicago, IL, USA",41.7954363,-87.6962257
"Jefferson Memorial Park, Chicago ",Jun-12 Monsters University ,"Jefferson Memorial Park, 4822 North Long Avenue, Chicago, IL 60630, USA",41.76083920000001,-87.6294353
"Commercial Club Playground, Chicago ",Jun-12 Despicable Me 2,"Chicago, IL, USA",41.8781136,-87.6297982

and so on. I'm seeing a lot of OrderedDict or defaultdict suggestions around here, but what's the best way to extend or append just the 'MovieDates' column now, instead of the entire rest of the row as the value to the 'Location' column key?

Could you provide the code you've tried? – rafaelc May 23 '15 at 14:46 — rafaelc, May 23 '15 at 14:46

Padraic Cunningham · Accepted Answer · 2015-05-25T19:57:23.487

2

Not sure what you plan on doing with the columns but this will group the elements by location

from collections import OrderedDict

od = OrderedDict()
import csv
with open("in.csv") as f,open("new.csv" ,"w") as out:
    r = csv.reader(f)
    wr= csv.writer(out)
    header = next(r)
    for row in r:
        loc,*rest = row
        od.setdefault(loc, []).extend(rest)
    wr.writerow(header)
    for loc,vals in od.items():
        wr.writerow([loc]+vals)

Presumed input:

Location    Movie Title Date    
"Edgebrook Park, Chicago","A League of Their Own",7-Jun
"Edgebrook Park, Chicago","It's a Mad, Mad, Mad, Mad World", 9-Jun

Output:

Location    Movie Title Date    
"Edgebrook Park, Chicago",A League of Their Own,7-Jun,"It's a Mad, Mad, Mad, Mad World",9-Jun

I presume your csv file is actually structured properly and the columns are actually comma delimited, if not then it will get a lot more complicated.

If your format is actually as posted you will have to split yourself:

from collections import OrderedDict

od = OrderedDict()
import csv
import re
with open("in.csv") as f,open("new.csv", "w") as out:
    header = next(f)
    for line in f:
        loc, rest = re.split("\s{2,}",line.rstrip(),1)
        od.setdefault(loc, []).extend(rest)
    out.write(header)
    for loc, vals in od.items():
        out.write("{} ".format(loc))
        out.write(" ".join(vals))

Input:

Location    Movie Title Date    
Edgebrook Park, Chicago     A League of Their Own   7-Jun   
Edgebrook Park, Chicago     It's a Mad, Mad, Mad, Mad World 9-Jun

Output:

Location    Movie Title Date    
Edgebrook Park, Chicago A League of Their Own   7-Jun It's a Mad, Mad, Mad, Mad World 9-Jun

If your format is a bit screwed up, I would take the opportunity to try to get it into some format that is more easily parsed.

For python 2:

from collections import OrderedDict

od = OrderedDict()
import csv
with open("in.csv") as f,open("new.csv" ,"w") as out:
    r = csv.reader(f)
    wr= csv.writer(out)
    header = next(r)
    for row in r:
        loc,rest = row[0], row[1:]
        od.setdefault(loc, []).extend(rest)
    wr.writerow(header)
    for loc,vals in od.items():
        wr.writerow([loc]+vals)

edited May 25 '15 at 19:57

answered May 23 '15 at 14:55

Padraic Cunningham

176,452
29
245
321

`loc,*rest = row` returns invalid syntax for me. is that correct? – SpicyClubSauce May 25 '15 at 19:51
@SpicyClubSauce. that is python3 syntax, I will add the python2 equivalent – Padraic Cunningham May 25 '15 at 19:52
thanks Padraic. If my columns are actually longer (https://github.com/yongcho822/Movies-in-the-park/blob/master/MovieDictFormatted.csv) but I still only want to group by location and append just the 'MovieDate' column, why does changing the line of code to `loc,rest = row[0], row[1]` not work? – SpicyClubSauce May 25 '15 at 20:07
i have confirmed by writing `print loc, rest` after setting that line of code^ that loc, rest are indeed set to what we want... `Paschen Park, Chicago Jun-29 Stanley Ka Dabba Jackson Park, Chicago Jun-30 Jaws Ridge Park, Chicago Jun-30 Jumanji Millennium Park, Chicago Jul-1 Top Gun Montgomery Ward Park, Chicago Jul-1 Captain Phillip`` etc.. – SpicyClubSauce May 25 '15 at 20:10
Hey @Padraic-Cunningham, i've written out my issue here in a nicer to read format here, would really appreciate just a bit more help. thanks man. http://stackoverflow.com/questions/30445574/python-ordered-dict-issue – SpicyClubSauce May 25 '15 at 20:40
@SpicyClubSauce, is your csv actually comma separated? – Padraic Cunningham May 25 '15 at 20:44
it is. https://github.com/yongcho822/Movies-in-the-park/blob/master/MovieDictFormatted.csv – SpicyClubSauce May 25 '15 at 20:44
I will have a look, if you want to remove the other question I will add the code here – Padraic Cunningham May 25 '15 at 20:46
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/78722/discussion-between-spicyclubsauce-and-padraic-cunningham). – SpicyClubSauce May 25 '15 at 20:54

score 1 · Answer 2 · answered May 23 '15 at 15:04

1

from collections import defaultdict

# rows containing your data
rows = ...

byLocation = defaultdict(list)

for row in rows:
    byLocation[row[0]].append(row[1:])

answered May 23 '15 at 15:04

sirfz

4,097
23
37

There will be no order using a defaultdict which may notbe suitable – Padraic Cunningham May 23 '15 at 15:11
1

The OP didn't mention order as a requirement though – sirfz May 23 '15 at 15:12

score 0 · Answer 3 · edited May 23 '17 at 10:26

This problem would be easy to solve with an OrderedDefaultdict from another answer of mine (shown below). It would the be equally easy to output the values associated with each theater location.

import collections
import csv

class OrderedDefaultdict(collections.OrderedDict):
    def __init__(self, *args, **kwargs):
        if not args:
            self.default_factory = None
        else:
            if not (args[0] is None or callable(args[0])):
                raise TypeError('first argument must be callable or None')
            self.default_factory = args[0]
            args = args[1:]
        super(OrderedDefaultdict, self).__init__(*args, **kwargs)

    def __missing__ (self, key):
        if self.default_factory is None:
            raise KeyError(key)
        self[key] = default = self.default_factory()
        return default

    def __reduce__(self):  # optional, for pickle support
        args = (self.default_factory,) if self.default_factory else ()
        return self.__class__, args, None, None, self.iteritems()

movies = OrderedDefaultdict(list)

with open('movies.csv', 'rb') as f:
    csv_reader = csv.DictReader(f, delimiter='\t')
    for row in csv_reader:
        movies[row['Location']].append(' '.join([row['Movie Title'], row['Date']]))

import json  # just to display dictionary created
print(json.dumps(movies, indent=4))

Output:

{
    "Edgebrook Park, Chicago": [
        "A League of Their Own 7-Jun",
        "It's a Mad, Mad, Mad, Mad World 9-Jun"
    ]
}

score -1 · Answer 4 · edited May 23 '15 at 20:53

-1

Try the following code:

 from collections import defaultdict
 import csv

 ret = defaultdict([])
 f = open("in.csv")
 fread = csv.reader(f)
 for r in fread:
     ret[r[0]].append("{}, {} ".format(r[1], r[2]))
 res = ["{} {}".format(k, "".join(ret[k])) for k in ret]
 print res
 f.close()

edited May 23 '15 at 20:53

kenorb

155,785
88
678
743

answered May 23 '15 at 15:23

Narcisse Doudieu Siewe

629
5
15

Reading thru CSV file, finding columns with same value and appending their other column values

4 Answers4