0

I have 3 dataframes with different columns with each row having diferent IDs (for all 3 DF).

I have appended all of those rows id in a Dataframe4 and sorted.

and I'm trying to read each line of the 3 dataframes in the right order, based on the dataframe4 sorted.

but i'm stuck here:


df1 = pd.DataFrame({
    'rowid': ['1', '4'],
    'Column2': ['1100', '1100']
})

df2 = pd.DataFrame({
    'rowid': ['2', '5', '7', '9', '11', '13'],
    'Column3': ['xxr', 'xxv', 'xxw', 'xxt', 'xxe', 'xxz'],
    'Column4': ['wer', 'cad', 'sder', 'dse', 'sdf', 'csd']
})

df3 = pd.DataFrame({
    'rowid': ['3', '6', '8', '10', '12', '14'],
    'Column3': ['xxr', 'xxv', 'xxw', 'xxt', 'xxe', 'xxz'],
    'Column4': ['wer', 'cad', 'sder', 'dse', 'sdf', 'csd'],
    'Column5': ['xxr', 'xxv', 'xxw', 'xxt', 'xxe', 'xxz'],
    'Column6': ['xxr', 'xxv', 'xxw', 'xxt', 'xxe', 'xxz'],
})


df4 = pd.DataFrame({
    'rowid': ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14']
})

for df in df4:
    d = df4["rowid"]
    for i in df1, df2, df3:
        j = d.isin(i['rowid'])
        res = [k for k, val in enumerate(j) if val]
        print(res)

Any ideas?

the simple append and sort won't work because of different columns, i'll read each line as txt

Desired Output:

1, 1100
2, xxr, wer
3, xxr, wer, xxr, xxr
4, 1100
5, xxv, cad
6, xxv, cad, xxv, xxv

... 

14, xxz, csd, xxz, xxz
  • Your df4 `rowid` are currently in ascending order, as are your df1, df2, df3 `rowid` values. The way it currently stands, you should just `concat` df1, df2, df3 and read them in order. Is this an abstraction for a more complex column? And if so what is the complexity that appears to be missing from your quesiton? – Henry Ecker Apr 30 '21 at 02:32
  • Hey Henry, the complexity is that i'm going to read them as text, I just edited, the rows ids are not in order. I need to read each line of txt in order of df4. df5 = row1-df1, row2-df2, row3-df3, row4-df1, row5-df2 .. and goes on – Tanai Goncalves Apr 30 '21 at 02:36
  • What do you mean "read them as text"? Can you edit your question to include a mock of your desired output? – Henry Ecker Apr 30 '21 at 02:52

2 Answers2

1
  1. Concat df1, df2, df3 together.
  2. Build a sorter to order based on the values in df4. (See. sorting by a custom list in pandas)
  3. Apply and use ', '.join on all notnull values in the row.
  4. Join the whole dataframe with '\n':
import pandas as pd

df1 = pd.DataFrame({
    'rowid': ['1', '4'],
    'Column2': ['1100', '1100']
})

df2 = pd.DataFrame({
    'rowid': ['2', '5', '7', '9', '11', '13'],
    'Column3': ['xxr', 'xxv', 'xxw', 'xxt', 'xxe', 'xxz'],
    'Column4': ['wer', 'cad', 'sder', 'dse', 'sdf', 'csd']
})

df3 = pd.DataFrame({
    'rowid': ['3', '6', '8', '10', '12', '14'],
    'Column3': ['xxr', 'xxv', 'xxw', 'xxt', 'xxe', 'xxz'],
    'Column4': ['wer', 'cad', 'sder', 'dse', 'sdf', 'csd'],
    'Column5': ['xxr', 'xxv', 'xxw', 'xxt', 'xxe', 'xxz'],
    'Column6': ['xxr', 'xxv', 'xxw', 'xxt', 'xxe', 'xxz'],
})

df4 = pd.DataFrame({
    'rowid': ['1', '2', '3', '4', '5', '6', '7', '8', '9',
              '10', '11', '12', '13', '14']
})

# Concat DataFrames Together
merged = pd.concat((df1, df2, df3))

# Build Sorter
sorterIndex = dict(zip(df4['rowid'], range(len(df4['rowid']))))
# Apply Rank to each Index
merged['rank'] = merged['rowid'].map(sorterIndex)
# Sort By Rank and Remove Column
merged = merged.sort_values('rank').drop(columns=['rank'])

# Join Not Nulls with ', '
output = merged.apply(lambda s: ', '.join(s[s.notnull()]), axis=1)

# Display Output on New Lines
print('\n'.join(output))

Output:

1, 1100
2, xxr, wer
3, xxr, wer, xxr, xxr
4, 1100
5, xxv, cad
6, xxv, cad, xxv, xxv
7, xxw, sder
8, xxw, sder, xxw, xxw
9, xxt, dse
10, xxt, dse, xxt, xxt
11, xxe, sdf
12, xxe, sdf, xxe, xxe
13, xxz, csd
14, xxz, csd, xxz, xxz
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
0

I merged the dataframe then made rowid as index. Finally I sorted rowid in ascending. Hope this would work

import pandas as pd

df1 = pd.DataFrame({
    'rowid': ['1', '4'],
    'Column2': ['1100', '1100']})
df2 = pd.DataFrame({
    'rowid': ['2', '5', '7', '9', '11', '13'],
    'Column3': ['xxr', 'xxv', 'xxw', 'xxt', 'xxe', 'xxz'],
    'Column4': ['wer', 'cad', 'sder', 'dse', 'sdf', 'csd']})

df3 = pd.DataFrame({
    'rowid': ['3', '6', '8', '10', '12', '14'],
    'Column3': ['xxr', 'xxv', 'xxw', 'xxt', 'xxe', 'xxz'],
    'Column4': ['wer', 'cad', 'sder', 'dse', 'sdf', 'csd'],
    'Column5': ['xxr', 'xxv', 'xxw', 'xxt', 'xxe', 'xxz'],
    'Column6': ['xxr', 'xxv', 'xxw', 'xxt', 'xxe', 'xxz'],})

merged = pd.concat((df1, df2, df3))

merged

merged = merged.set_index("rowid")

merged = merged.sort_index()

merged
Shashank
  • 93
  • 1
  • 9