0

I'm trying to iterate over a number of CSV files and join together each 'sequence' column of each dataframe to the first value. Here is what each file looks like:

ID      Order    Sequence
1773     1        'AAGG'
1773     2        'TTGG'
1773     3        'GGAA'

And I need it to look like this for each CSV:

ID       Sequence
1773   'AAGGTTGGGGAA'

I don't have any need for the 'order' column after this. I've tried many different commands but can't seem to find the right one.

Right now I have:

path = r'C:\Users\CAAVR\Desktop\folder\*.csv'
for fname in glob.glob(path):
    df = pd.read_csv(fname)
    first = df['sequence'].iloc[:1]
    next = df['sequence'].iloc[2:]
    final = first.str.join(next)
    print(final)

I know .join() isn't right but concat and merge don't seem to work either. Keep getting:

AttributeError: 'Series' object has no attribute 'join'

Let me know if you need any other info and thanks for the help!

  • Does this answer your question? [Python Pandas concatenate a Series of strings into one string](https://stackoverflow.com/questions/41400381/python-pandas-concatenate-a-series-of-strings-into-one-string) – AMC Mar 04 '20 at 04:29

2 Answers2

1

You can use join, but what comes before .join is the separator between the strings, here it is ''. You can do join inside apply: I assume you want one row per ID?

df = pd.DataFrame({'ID':[1773,1773,1773],'Order':[1,2,3], 'Sequence':['AAGG','TTGG','GGAA']})
# group df by ID, join all Sequences and convert to dataframe
final = df.groupby('ID')['Sequence'].apply(lambda x: ''.join(x)).to_frame().reset_index()

Output is

    ID      Sequence
 0  1773    AAGGTTGGGGAA
fmarm
  • 4,209
  • 1
  • 17
  • 29
0

You can just use aggregate function to join strings.

df = pd.DataFrame({'ID':[1773,1773,1773,1774,1774,1774],'Order':[1,2,3,4,5,6], 'Sequence':['AAGG','TTGG','GGAA','GGTC','ATCG','AAGG']})
df.groupby("ID").agg({"Sequence":"sum"})

Output:

    Sequence
ID  
1773    AAGGTTGGGGAA
1774    GGTCATCGAAGG
​

Alternatively,you can just apply .join function to your aggregate function.

df.groupby("ID").agg({"Sequence":"".join})

In addition to these @fmarm has provided another interesting way.

I would suggest you to go through these links for more understanding:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.join.html

Concatenate strings from several rows using Pandas groupby

Akhil Sharma
  • 133
  • 1
  • 10