0

I am fairly new to using Python and having come from using SQL I have been using PANDAS to build reports from CSV files with reasonable success. I have been able to answer most of questions thanks mainly to this site, but I dont seem to be able to find an answer to my question:

I have a dataframe which has 2 columns I want to be able to group on the first column and display the lowest and highest alphabetical values from the second column concatenated into a third column. I could do this fairly easy in SQL but as I say I am struggling getting my head around it in Python/Pandas

example:

source data:

LINK_NAME, CITY_NAME

Linka, Citya
Linka, Cityz
Linkb,Cityx
Linkb,Cityc

Desired output:

LINK_NAME,LINKID

Linka, CityaCityz
Linkb,CitycCityx

Andrew C
  • 1
  • 2
  • 1
    Does this answer your question? [Concatenate strings from several rows using Pandas groupby](https://stackoverflow.com/questions/27298178/concatenate-strings-from-several-rows-using-pandas-groupby) – David Foster Feb 02 '20 at 15:18
  • Hi David, I think that will help, although its the capturing of the lowest and highest alphabetical string I am struggling with more so in my head – Andrew C Feb 02 '20 at 15:22
  • Ah I see, can there be more than two cities per link? The answer below will sort all cities within each group. – David Foster Feb 02 '20 at 15:26

1 Answers1

0

Edit:

Sorry for missing part of your question. To sort the strings within each group alphabetically, you could define a function to apply to the grouped items:

def first_and_last_alpha(series):
    sorted_series = series.sort_values()
    return "".join([sorted_series.iloc[0], sorted_series.iloc[-1]])

df.groupby("LINK_NAME")["CITY_NAME"].apply(first_and_last_alpha)

Original:

Your question seems to be a duplicate of this one.

The same effect, with your data, is achieved by:

df.groupby("LINK_NAME")["CITY_NAME"].apply(lambda x: "".join(x))

where df is your pandas.Dataframe object

In future, it's good to provide a reproducible example, including anything you've attempted before posting. For example, the output from df.to_dict() would allow me to recreate your example data instantly.

David Foster
  • 447
  • 4
  • 16
  • David,Thanks Kindly for your response the code you provided sorts my data in the 2 columns perfectly but i seem unable to concatenate the 2 'City Names' into a new column with what you have provided. – Andrew C Feb 02 '20 at 21:28
  • David, My apologies your grouping code worked perfectly - it was my error in how i was implementing – Andrew C Feb 03 '20 at 08:02
  • No problem @AndrewC, glad you got it working – David Foster Feb 03 '20 at 08:53