3

I have few lists defined as shown below which needs to be converted into a pandas dataframe. Although I have provided just 4 lists, in my problem, I may have n number of lists and I do not have any prior knowledge of names of these lists with the exception of one list col_names.

list1 = [1,2,3,4]
list2 = [5,6,7,8]
list3 = [9,10,11,12]
.
.
.
listn = [....]

col_names = ['A', 'B', 'C', 'D']

Desired output is a pandas dataframe df combining all the n lists and the one list col_names as a column name as shown below:

import pandas as pd

df = pd.Dataframe([list1, list2, list3,.....,listn], columns = col_names)
print(df)

   A   B   C   D
0  1   2   3   4
1  5   6   7   8
2  9  10  11  12
.  .  .   .   .
.  .  .   .   .
n  .  .   .   .

I have tried getting a list of all variables using globals() by referring to inputs from this question. But this method returns only names of variables as strings and stores them in a dictionary. I am unable to check which of those variables are lists so that I can perform some kind of iteration using which I can append all list values to the dataframe df. Any guidance on how I can solve this would really be appreciated.

I am using Python 3.7.4 on Windows 10 (x64).

Code_Sipra
  • 1,571
  • 4
  • 19
  • 38
  • 3
    Where do the lists come from? Can you pre-arrange them into a list of lists? – DYZ Jul 24 '19 at 16:00
  • Unfortunately, these are lists already created in the python environment.The problem I have been given is to check which of them are lists and convert them into a dataframe. – Code_Sipra Jul 24 '19 at 16:02
  • "***I am unable to check which of those variables are lists***" - What's wrong with `type`? – Pedro Lobito Jul 24 '19 at 16:02
  • 1
    Do the list names follow any particular pattern? Are they indeed called `list1`, `list2`, etc? – DYZ Jul 24 '19 at 16:03
  • If I have 500 variables already defined having some values, and 400 of them are lists and the other 100 are other data types, I am not sure how to filter them out since I do not know the names of any of the variables before hand. The only pattern is that these lists have a length of 4. – Code_Sipra Jul 24 '19 at 16:04
  • 1
    So, do you want to combine _all_ lists? Are you sure you do not have any lists that you want to skip? – DYZ Jul 24 '19 at 16:05
  • Yes, I want to combine all lists which have a length of 4 into a single dataframe. I can then do any cleanup as required once I have a dataframe. These lists have only integer values. But I do not know if this will always be the case. – Code_Sipra Jul 24 '19 at 16:07
  • If you want to combine _all_ list is still possible to do. If you have other lists which should not be combined, and you have no pattern name to guide you, I'm afraid is impossible. – Valentino Jul 24 '19 at 16:08
  • I understand. If the pattern of these lists is all of them have a length of 4 and if all values are integers, is there any way we can construct a dataframe? – Code_Sipra Jul 24 '19 at 16:10
  • 3
    If you can, you should change the source code to either put all your lists into a list of lists. Having hundreds of similarly named objects that you have to reference by name is very bad practice, and you're going to find more and more issues like this as you go along. – PMende Jul 24 '19 at 16:14
  • I think you will find that a lot of folks are going to agree with @PMende and point out that you are basically trying to put a very large bandage on a gaping wound. Your real problem is caused by creating a script with 400 `list` variables that should each be appended to another `list`. At some point, this bandage will become insufficient and you will have to fix the real problem. –  Jul 24 '19 at 16:36
  • Yes, I understand that this could some day become a problem. – Code_Sipra Jul 24 '19 at 16:39
  • If possible, those variable definitions can be separated in another module or class so that you could use `vars(module_or_class_name)` to access the values rather than `globals()` to avoid interference with the global context of the current module. – GZ0 Jul 24 '19 at 16:53

2 Answers2

4

globals() returns a dictionary of all global variables, their names and values. Go over the list of values and choose those that match your criteria (being a list and having 4 items).

a = [1,2,3,4]
b = [5,6,7]
c = [8,9,10,11]

all_lists = [y for y in globals().values() \
             if isinstance(y, list) and len(y) == 4]
#[[1, 2, 3, 4], [8, 9, 10, 11]]

Alternatively, define an arbitrary criterion function:

def is_good(x):
    return isinstance(x, list) \
           and len(x) == 4 \
           and all(map(lambda y: isinstance(y, int), x))

and apply it to each global variable:

all_lists = [x for x in globals().values() if is_good(x)]
#[[1, 2, 3, 4], [8, 9, 10, 11]]

This allows to check for the data type of the list items.

DYZ
  • 55,249
  • 10
  • 64
  • 93
  • This works! Just one question. Is there any way to exclude the list `col_names` from the above list comprehension? That is the only list which I defined to provide as headers to the dataframe. – Code_Sipra Jul 24 '19 at 16:16
  • I added a condition to check for the datatype of the items. – DYZ Jul 24 '19 at 16:17
  • Thank you for helping me out! Really appreciate all the help. – Code_Sipra Jul 24 '19 at 16:18
1

If you pass a key to the dictionary returned by globals(), you will get the variable. You can use type to discover the type:

In [1]: example_list = list()                                                                                                                                                       

In [2]: globals_dict = globals()                                                                                                                                                    

In [3]: type(globals_dict['example_list'])                                                                                                                                          
Out[3]: list

You can then iterate over the items in globals() and find out which items are of type list:

In [6]: globals_dict = dict(globals())                                                                                                                                              

In [7]: for key, value in globals_dict.items(): 
   ...:     if type(value) == list: 
   ...:         print("%s is a list" % key) 
   ...:                                                                                                                                                                             
_ih is a list
_dh is a list
In is a list
example_list is a list

Note that I had to make a copy of the return value of globals() because of this problem:

In [4]: for key, value in globals_dict.items(): 
   ...:     if type(value) == list: 
   ...:         print("%s is a list", key) 
   ...:                                                                                                                                                                             
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-4-18203a48d32a> in <module>
----> 1 for key, value in globals_dict.items():
      2     if type(value) == list:
      3         print("%s is a list", key)
      4 

RuntimeError: dictionary changed size during iteration
  • Thank you @David Cullen! This solution also works. But I since DYZ provided a solution first, I accepted that as an answer! – Code_Sipra Jul 24 '19 at 16:22
  • You don't have to apologize for your choice of best answer. The answer you chose is obviously more helpful for your specific case and incorporates information from the comments on your question. My goal was to address the more general problem of how to understand and use the return value of `globals()`. –  Jul 24 '19 at 16:26