-1

being new to python I am looking for some help reshaping this data, already know how to do so in excel but want a python specific solution.

enter image description here

I want it to be in this format.

enter image description here

entire dataset is 70k rows with different vc_firm_names, any help would be great.

dhruv
  • 151
  • 2
  • 3
  • 14
  • Does this answer your question? [How to group dataframe rows into list in pandas groupby](https://stackoverflow.com/questions/22219004/how-to-group-dataframe-rows-into-list-in-pandas-groupby). Specifically [this answer](https://stackoverflow.com/a/22221675/2221001). – JNevill Sep 22 '22 at 15:30
  • its a nice start, but from the looks of it, it would be a real pain to manually add group by for each of the 57k unique entries – dhruv Sep 22 '22 at 15:34
  • You wouldn't need to do that. It would look something like `df = df.groupby('vc_firm_name')['investment_industry'].apply(list)` – JNevill Sep 22 '22 at 15:38
  • 1
    Or `df = df.groupby('vc_firm_name')['investment_industry'].apply(lambda x: ','.join(list(x))).reset_index(name='investment_industry')` or what-have-you. There are a few ways to skin this cat that are mentioned in that q&a – JNevill Sep 22 '22 at 15:48
  • oh thanks, making more sense now, thanks for the great help – dhruv Sep 22 '22 at 16:08

2 Answers2

1

Assuming the original file is "original.csv", and you want to save it as "new.csv" I would do:

pd.read_csv("original.csv").groupby(by=["vc_firm_name"],as_index=False).aggregate(lambda x: ','.join(x)).to_csv("new.csv", index=False)
PlainRavioli
  • 1,127
  • 1
  • 1
  • 10
1

If you care about performance, then I suggest you take a look at other methods (such as using numpy, or sorting the table):

  1. https://stackoverflow.com/a/42550516/17323241
  2. https://stackoverflow.com/a/66018377/17323241
  3. https://stackoverflow.com/a/22221675/17323241 (look at second comment)

Otherwise, you can do:

# load data from csv file
df = pd.read_csv("example.csv")
# aggregate
df.groupby("vc_first_name")["investment_industry"].apply(list)
ela16
  • 740
  • 1
  • 2
  • 13