2

I'm consuming an API and some column names are too big for mysql database.

How to ignore field in dataframe?

I was trying this:

import pandas as pd
import numpy as np

lst =['Java', 'Python', 'C', 'C++','JavaScript', 'Swift', 'Go'] 

df = pd.DataFrame(lst)
limit = 7

for column in df.columns:
   if (pd.to_numeric(df[column].str.len())) > limit:
        df -= df[column]
        print (df)

result:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

My preference is to delete the column that is longer than my database supports.

But I tried slice to change the name and it didn't work either.

I appreciate any help

Corralien
  • 109,409
  • 8
  • 28
  • 52
  • Your dataframe has only a single column named `0` (as you didn't specify a column name). Each row of that column has the values you are looking to test for character length. So looping through dataframe columns isn't going to do anything. – JNevill Feb 23 '22 at 20:04
  • You're right about that. Of course my problem was more extensive, but I used a wrong example and I apologize for that. Thank's Nevill – Guaraci Falcão Feb 23 '22 at 20:35

2 Answers2

0

Suppose the following dataframe

>>> df
      col1          col2      col3        col4
0   5uqukp  g7eLDgm0vrbV     Bnssm  tRJnSQma6E
1   NDsApz        lu02dO    ogbRz5  481riI6qne
2    UEfni    YV2pCXYFbd   pyHYqDH   fghpTgItm
3  a0PvRSv      0FwxzFqk   jUHQliB      W2dBhH
4   BQgTFp       FMseKnR      ifgt     tw1j7Ld
5  1vvF2Hv   cwTyt2GtpC4    P039m2   1qR2slCmu
6  JYnABTr        oLdZVz    KYBspk      RgsCsu

To remove columns where at least one value have a length greater than 7 characters, use:

>>> df.loc[:, df.apply(lambda x: x.str.len().max() <= 7)]
      col1     col3
0   5uqukp    Bnssm
1   NDsApz   ogbRz5
2    UEfni  pyHYqDH
3  a0PvRSv  jUHQliB
4   BQgTFp     ifgt
5  1vvF2Hv   P039m2
6  JYnABTr   KYBspk

To understand the error, read this post

Corralien
  • 109,409
  • 8
  • 28
  • 52
0

As I mentioned in my comment, when you do df = pd.DataFrame(lst) you are saying to create a dataframe with a single column where the rows are populated by your single-dimension list. So iterating through columns of the dataframe isn't doing anything as there is only a single column

That being said, this is an advantage as you can use a set based approach to answer your question:

import pandas as pd
import numpy as np

lst =['Java', 'Python', 'C', 'C++','JavaScript', 'Swift', 'Go'] 

df = pd.DataFrame(lst)

limit = 7
print(df[df[0].str.len() > limit])

That will spit out a dataframe with a single column and a single row containing "Javascript" the only value that is over your character length limit. If you wanted to keep the values that are under the limit just change that > to <=.

Corralien
  • 109,409
  • 8
  • 28
  • 52
JNevill
  • 46,980
  • 4
  • 38
  • 63