reshape data using python?

Question

being new to python I am looking for some help reshaping this data, already know how to do so in excel but want a python specific solution.

I want it to be in this format.

entire dataset is 70k rows with different vc_firm_names, any help would be great.

Does this answer your question? [How to group dataframe rows into list in pandas groupby](https://stackoverflow.com/questions/22219004/how-to-group-dataframe-rows-into-list-in-pandas-groupby). Specifically [this answer](https://stackoverflow.com/a/22221675/2221001). — JNevill, Sep 22 '22 at 15:30
its a nice start, but from the looks of it, it would be a real pain to manually add group by for each of the 57k unique entries — dhruv, Sep 22 '22 at 15:34
You wouldn't need to do that. It would look something like `df = df.groupby('vc_firm_name')['investment_industry'].apply(list)` — JNevill, Sep 22 '22 at 15:38
Or `df = df.groupby('vc_firm_name')['investment_industry'].apply(lambda x: ','.join(list(x))).reset_index(name='investment_industry')` or what-have-you. There are a few ways to skin this cat that are mentioned in that q&a — JNevill, Sep 22 '22 at 15:48

PlainRavioli · Answer 1 · 2022-09-22T15:44:05.050

1

Assuming the original file is "original.csv", and you want to save it as "new.csv" I would do:

pd.read_csv("original.csv").groupby(by=["vc_firm_name"],as_index=False).aggregate(lambda x: ','.join(x)).to_csv("new.csv", index=False)

edited Sep 22 '22 at 15:44

answered Sep 22 '22 at 15:38

PlainRavioli

score 1 · Accepted Answer · answered Sep 22 '22 at 15:51

If you care about performance, then I suggest you take a look at other methods (such as using numpy, or sorting the table):

Otherwise, you can do:

# load data from csv file
df = pd.read_csv("example.csv")
# aggregate
df.groupby("vc_first_name")["investment_industry"].apply(list)

2 Answers2