Loading in dataframes with columns spanning multiple lines using pd.read_clipboard

Question

    user                             item  \
0  b80344d063b5ccb3212f76538f3d9e43d87dca9e          The Cove - Jack Johnson   
1  b80344d063b5ccb3212f76538f3d9e43d87dca9e  Entre Dos Aguas - Paco De Lucia   
2  b80344d063b5ccb3212f76538f3d9e43d87dca9e            Stronger - Kanye West   
3  b80344d063b5ccb3212f76538f3d9e43d87dca9e    Constellations - Jack Johnson   
4  b80344d063b5ccb3212f76538f3d9e43d87dca9e      Learn To Fly - Foo Fighters   

rating  
0       1  
1       2  
2       1  
3       1  
4       1

Is there any way of loading in such data in the intended format without having to manually move everything into the same line?

Even I have this question. Its bit of hard work to move everything to one line. — Bharath M Shetty, Aug 25 '17 at 14:20
@coldspeed anything you would like to suggest for my solution? — Bharath M Shetty, Aug 26 '17 at 05:01

Bharath M Shetty · Answer 1 · 2017-08-26T09:04:54.290

One of the way is by spliting based on \n\n, then creating separate dataframes and then concatenating them. i.e

#Bit of code from https://stackoverflow.com/questions/45740537/copying-multiindex-dataframes-with-pd-read-clipboard

def read_clipboard_split(index_names_row=None, **kwargs):
    encoding = kwargs.pop('encoding', 'utf-8')

    # only utf-8 is valid for passed value because that's what clipboard
    # supports
    if encoding is not None and encoding.lower().replace('-', '') != 'utf8':
        raise NotImplementedError(
            'reading from clipboard only supports utf-8 encoding')

    from pandas import compat, read_fwf
    from pandas.io.clipboard import clipboard_get
    from pandas.io.common import StringIO

    data = clipboard_get()
    items = data.split("\n\n")
    k = []
    for i in items:
        k.append(read_fwf(StringIO(i), **kwargs))
    df = pd.concat(k,axis=1)
    return df

read_clipboard_split()

Sample run :

     user                       \      
0  b80344d063b5ccb3212f76538f3d9e43d87dca9e
1  b80344d063b5ccb3212f76538f3d9e43d87dca9e  
2  b80344d063b5ccb3212f76538f3d9e43d87dca9e   
3  b80344d063b5ccb3212f76538f3d9e43d87dca9e   
4  b80344d063b5ccb3212f76538f3d9e43d87dca9e   

   rating  
0       1  
1       2  
2       1  
3       1  
4       1

Output:

   Unnamed: 0              user                       \  Unnamed: 0  rating
0  0           b80344d063b5ccb3212f76538f3d9e43d87dca9e  0           1     
1  1           b80344d063b5ccb3212f76538f3d9e43d87dca9e  1           2     
2  2           b80344d063b5ccb3212f76538f3d9e43d87dca9e  2           1     
3  3           b80344d063b5ccb3212f76538f3d9e43d87dca9e  3           1     
4  4           b80344d063b5ccb3212f76538f3d9e43d87dca9e  4           1

It would seem like some spurious columns are generated. – cs95 Aug 26 '17 at 05:17 — cs95, Aug 26 '17 at 05:17
Yeah bit cleaning of data is necessary. – Bharath M Shetty Aug 26 '17 at 05:18 — Bharath M Shetty, Aug 26 '17 at 05:18

Loading in dataframes with columns spanning multiple lines using pd.read_clipboard

1 Answers1