1

I'm trying to create a table (to print it in LaTex after) that contains all different values of a given DataFrame:

dfDiff = pd.DataFrame(columns=df2.columns)
for col in df2:
    dfDiff[col]=(df2[col].unique())

I have this error message:

ValueError: Length of values does not match length of index

is any better way to do this?

enden
  • 163
  • 3
  • 10
  • 2
    Unless you have the same number of unique values in each column, e.g. you have exactly 10 unique values in each column, this will fail due to alignment. What should the desired df look like? – EdChum Feb 08 '17 at 16:34
  • my df2 contains different columns (occupation, sex, language...) so I don't have a same unique value in each column, that's why I'm struggling with this problem – enden Feb 08 '17 at 16:53
  • I'm struggling to make sense of what I believe it is your are trying to achieve. Can your provide example input and desired output? Are you looking for unique rows (as in combinations of columns)? Because if your are in fact looking for unique values in each column, what value does the structure of a DataFrame add versus just having lists (or Series) of unique entries in each column? It seems the relationship between values is not important to you. It may be too early in the morning for me, but I can't seem to find any uses for the requested output. – Plasma Feb 09 '17 at 08:13

3 Answers3

0

OK, it looks like your goal is to have a second DataFrame with the same number and name of columns, where each column contains the unique values of the initial DataFrame. I don't know if that's the best way to go about it, so I'll show you how I'd do it your way, and then suggest another way to print it nicely.

As mentioned, the error you're getting is because you're trying to create a DataFrame with columns of different lengths. You can do this with some finagling if you're OK with NaN values in the "empty" cells. I would approach it like this:

  1. Get your column names, and save them in a list.
  2. Create a list to hold the unique values from each column of df2 as a new Series.
  3. Iterate through each column name, storing the new series of each column's unique values
  4. Figure out which column is the longest, and create an empty (filled with NaNs) DataFrame based on the # of columns and the longest list of unique values.
  5. Lastly, replace what NaN values you can with actual values, and print the DF.

    import pandas as pd
    
    colNames = df2.columns.tolist()
    uniqueValsList = []                    
    
    for each in colNames:
        uniqueVals = list(df2[each].unique())
        uniqueValsList.append(pd.Series(data=uniqueVals,name=each))
    
    maxlen = 0
    for each in uniqueValsList:
        if len(each) > maxlen:
            maxlen = len(each)
    
    fillerData = np.empty((maxlen,len(colNames),))
    dfDiff = pd.DataFrame(columns=colNames,data=fillerData)
    
    for i in range(len(uniqueValsList)):
        dfDiff[colNames[i]] = uniqueValsList[i]
    dfDiff
    

This will allow you to print out a DF with your unique values, but it will look weird with all the NaN values. I would recommend doing it with HTML and the tabulate module, as in this answer. For example:

    from IPython.display import HTML, display
    import tabulate

    listOfLists = []
    for i in range(len(uniqueValsList)):
        thisList = []
        thisList.append(colNames[i])
        for each in uniqueValsList[i].tolist():
            thisList.append(each)
        listOfList.append(thisList)

    display(HTML(tabulate.tabulate(listOfLists, tablefmt='html')

I'm not familiar with LaTeX in Jupyter Notebooks, so if you've found a better way to do this I'd be interested to know! I tried messing with the tablefmt values in the display(HTML()) call, to no avail.

DangerousDave
  • 781
  • 1
  • 6
  • 7
0
list1 = []
for i in tips:
    list1.append(list(tips[i].unique()))
      
utip = pd.DataFrame(columns = list(tips.columns))
for i in range(len(list(utip.columns))):
    utip[list(utip.columns)[i]] = pd.Series(list1[i])
utip
-1

this function will take an array filled with multiple variables(that will probably be repeated more than once) and return an array that have only unique variables par example if the input = ["cc","ba","aa","aa","ab","ab"] the output will be ["cc","ba","aa","ab"]

> def unique(tu):
>     t=tu
>     i=0
>     while i< len(t) :
>          test=0
>          for j in range(len(tu)):
>              if t[i]==tu[j]:
>                  test+=1
>          if test > 1: 
>            for k in range(test-1)  :
>                  del t[i+k]
>                  i-=1
>          i+=1       
>                 
>     return t

> yy=["bb","ba","aa","aa","aa","ab","ab"] ul=unique(yy)                 
> for i in range(len(ul)):
>     print(ul[i])

  
  • 4
    Thank you for contributing an answer. Would you kindly edit your answer to to include an explanation of your code? That will help future readers better understand what is going on, and especially those members of the community who are new to the language and struggling to understand the concepts. – Jeremy Caney Jul 06 '21 at 00:02
  • 1
    (code that runs when pasted into a Python interpreter would seem preferable.) – greybeard Jul 06 '21 at 02:46
  • @JeremyCaney thank you for your comment but im actually new to stack overflow i was looking for a function that take an array filled with multiple variables(that will probably be repeated more than once) and return an array that have only unique variables par example if the input = ["cc","ba","aa","aa","ab","ab"] the output will be ["cc","ba","aa","ab"] – Bahachairet Jul 07 '21 at 18:54