-1

I have a poorly structured dataframe that was generated by reading tables in directly from a pdf.

I am trying to manipulate some of the data before putting it into visualization tools.

A key transformation I am trying to make is to extract a column header and use it as a row label. Here is an example of the kind of dataframe I am working with:

data = {'Col1': ['Alabama', 'nan', 'nan', 'nan', 'Wyoming', 'nan', 'nan', 'nan'],
        'Col2': ['nan', 1, 2, 3, 'nan', 1, 2, 3]}

df = pd.DataFrame(data)

The resulting dataframe looks a bit like this:

    Col1    Col2
0   AL  nan
1   nan 1
2   nan 2
3   nan 3
4   WY  nan
5   nan 1
6   nan 2
7   nan 3

Whereby the entries in Col 1 are mostly nan except for those on row 0 (AL) and row 4 (WY). These were effectively subheaders in the table in the pdf.

I am trying to write a code that takes the last valid value in Col1 (e.g., AL) and then fills the remaining rows below it until it encounters the next valid value (e.g., WY).

Correct output would look something like this:

    Col1    Col2
0   AL  nan
1   AL  1
2   AL  2
3   AL  3
4   WY  nan
5   WY  1
6   WY  2
7   WY  3

I am somewhat at a loss for how to proceed here and welcome any advise on where to start out.

  • Does this answer your question? [How to replace NaNs by preceding or next values in pandas DataFrame?](https://stackoverflow.com/questions/27905295/how-to-replace-nans-by-preceding-or-next-values-in-pandas-dataframe) Do this after you replace the string `'nan'` with `pd.NA` – Pranav Hosangadi Dec 08 '22 at 18:17
  • Indeed it does. Thank you! – Shazriki Dec 08 '22 at 18:31

1 Answers1

2

Do this:

df.Col1.fillna(method='ffill', inplace=True)
CodeKorn
  • 300
  • 1
  • 8
  • 1
    Probably need to replace "nan" with `np.nan` before. – luxcem Dec 08 '22 at 18:12
  • If the answer is "use this function that's already part of your library exactly the way the docs show you how to", then the question is generally a duplicate. In such situations, please look for the duplicate and flag/vote to close instead of adding an answer – Pranav Hosangadi Dec 08 '22 at 18:20
  • This works great. I actually replaced the 'nan' with None and worked great. Thank you! – Shazriki Dec 08 '22 at 18:31