2

Given this data frame from some other question:

         Constraint Name    TotalSP       Onpeak    Offpeak
Constraint_ID               
77127   aaaaaaaaaaaaaaaaaa  -2174.5     -2027.21    -147.29
98333   bbbbbbbbbbbbbbbbbb  -1180.62    -1180.62     0
1049    cccccccccccccccccc  -1036.53    -886.77     -149.76

It seems like there is an index Constraint_ID. When I try to read it in with pd.read_clipboard, this is how it gets loaded:

      Constraint                Name  TotalSP   Onpeak  Offpeak
0  Constraint_ID                 NaN      NaN      NaN      NaN
1          77127  aaaaaaaaaaaaaaaaaa -2174.50 -2027.21  -147.29
2          98333  bbbbbbbbbbbbbbbbbb -1180.62 -1180.62     0.00
3           1049  cccccccccccccccccc -1036.53  -886.77  -149.76

This is clearly wrong. How can I correct this?

cs95
  • 379,657
  • 97
  • 704
  • 746

2 Answers2

5

read_clipboard by default uses whitespace to separate the columns. The problem you see is because of the whitespace in the first column. If you specify two or more spaces as the separator, based on the table format it will figure out the index column itself:

df = pd.read_clipboard(sep='\s{2,}')

df
Out: 
                  Constraint Name  TotalSP   Onpeak  Offpeak
Constraint_ID                                               
77127          aaaaaaaaaaaaaaaaaa -2174.50 -2027.21  -147.29
98333          bbbbbbbbbbbbbbbbbb -1180.62 -1180.62     0.00
1049           cccccccccccccccccc -1036.53  -886.77  -149.76

index_col argument can also be used to tell pandas the first column is the index, in case the structure cannot be inferred from the separator alone:

df = pd.read_clipboard(index_col=0, sep='\s{2,}')
ayhan
  • 70,170
  • 20
  • 182
  • 203
  • Can you please edit your answer to include `index_cols`? I feel like sep works here, but that may not be the case always. Or... does pandas take care of this by default? – cs95 Aug 17 '17 at 02:51
  • @cᴏʟᴅsᴘᴇᴇᴅ I don't have an example that contradicts but we can pass `index_col=0` to be on the safe side of course. – ayhan Aug 17 '17 at 02:58
2

This is not as cool as @ayhan's answer, but most of the time works pretty well. Assuming you are using ipython or jupyter, just copy and paste the data into %%file:

Then do some quick edits. With multi-indexes, just move the index up a line, something like this (also shortening "Constraint ID" to "ID" to save a little space in this case):

%%file foo.txt
ID       Constraint Name    TotalSP       Onpeak    Offpeak
77127   aaaaaaaaaaaaaaaaaa  -2174.5     -2027.21    -147.29
98333   bbbbbbbbbbbbbbbbbb  -1180.62    -1180.62     0
1049    cccccccccccccccccc  -1036.53    -886.77     -149.76

pd.read_fwf('foo.txt')
Out[338]: 
      ID     Constraint Name  TotalSP   Onpeak  Offpeak
0  77127  aaaaaaaaaaaaaaaaaa -2174.50 -2027.21  -147.29
1  98333  bbbbbbbbbbbbbbbbbb -1180.62 -1180.62     0.00
2   1049  cccccccccccccccccc -1036.53  -886.77  -149.76

read_fwf generally works pretty well on tabular stuff like this, correctly dealing with spaces in column names (usually). Of course, you can also use this basic method with read_csv.

The nice thing about this method is that for small sample data you can deal with just about any of the weird ways that users post data here. And there are a lot of weird ways. ;-)

JohnE
  • 29,156
  • 8
  • 79
  • 109
  • This is actually similar to what I've been doing. I would copy to a text editor, make changes, and then call `pd.read_clipboard` again. – cs95 Aug 17 '17 at 03:56
  • I'm not sure why but `read_fwf` generally works better than `read_clipboard` in my experience. I'd expect the backend is pretty similar but maybe it's that they have different defaults? – JohnE Aug 17 '17 at 03:58