1

So I have a irregular dataframe with unnamed columns which looks something like this:

Unnamed:0 Unnamed:1 Unnamed:2 Unnamed:3  Unnamed:4
nan       nan        nan      2022-01-01  nan
nan       nan        nan      nan         nan
nan       nan        String   Name        Currency
nan       nan        nan      nan         nan
nan       nan        nan      nan         nan
nan       nan        String   nan         nan
nan       nan        xx       A           CAD
nan       nan        yy       B           USD 
nan       nan        nan      nan         nan

Basically what I want to do is to find in which column and row the 'String' name is and start the dataframe from there, creating:

String Name Currency
String nan  nan
xx     A    CAD
yy     B    USD
nan    nan  nan

My initial thought has been to use locate_row = df.apply(lambda row: row.astype(str).str.contains('String').any(), axis=1) combined with locate_col = df.apply(lambda column: column.astype(str).str.contains('String').any(), axis=0) This gives me series with the rows with the string and column with the string. My main problem is solving this without hardcoding using for eg. iloc[6:, 2:]. Any help to get to the desired dataframe without hardcoding is of great help.

Hassan33
  • 101
  • 7

2 Answers2

1

In your example you can drop the columns that are entirely null, then drop rows with any null values. The result is the slice you are looking for. You can then promote the first row to headers.

df = df.dropna(axis=1,how='all').dropna().reset_index(drop=True)
df = df.rename(columns=df.iloc[0]).drop(df.index[0])

Output

  String Name Currency
1     xx    A      CAD
2     yy    B      USD
Chris
  • 15,819
  • 3
  • 24
  • 37
  • Yeah this makes a ton of sense, however, my example was bad as I still need some of the nan values. Have corrected the example, sorry for that – Hassan33 Oct 19 '22 at 15:28
0

Hey you would need to iterate over the dataframe and then search for equality in the strings: in this article iteration is described How to iterate over rows in a DataFrame in Pandas

with this you can check for equality if s1 == s2: print('s1 and s2 are equal.')

Barathan
  • 21
  • 1