How to get values from dataframe with dynamic columns

Question

Python newbie here, I am not able to create a function which can extract certain columns' values into another form. I have tried to run a loop multiple times to get the data, but I am not able to find a good pythonic way to do it. Any help or suggestion would be welcome.

PS: The column with "Loaded with" is has the information that what items are loaded into it, but you can also get this info by seeing that there are few columns with name item_1L...

I was not able to find a better way to input the data on SO, so I have created a csv file of the dataframe.

I need the LBH of the separate items in the form of

Item1=4.6x4.3x4.3 Item2=4.6x4.3x4.3 or any other easily iterable way.

EDIT: When I say I needed the answer in the form of 4.6x4.3x4.3, I really meant I needed it in the form of "4.6x4.3x4.3" i.e.not the product of the numbers. I need the string format like this :

import pandas as pd
df = pd.DataFrame({'0': ['index', 'Name', 'Loaded 
with','item_0L','item_0B','item_0H','item_1L','item_1B','item_1H'], 
                   '1': [0, 'Tata- 
417','01','4.3','4.3','4.6','4.3','4.3','4.6',]})

string format

index  Loadedwith  item_0L  item_0B  item_0H  item_1L  item_1B  item_1H    
1              01      4.6      4.3      4.3      4.6      4.3  4.3'

Here is what I have been trying:

def get_df (df):

    total_trucks = len(df)
    total_items = 0
    for i in range(len(df["Loaded with"])):
        total_items += len((df["Loaded with"].iloc[i]))



    for i in range(len(df["Loaded with"])):
        for j in range(total_items):
            for k in range(len((df["Loaded with"].iloc[i]))):
#                 pass
#                 print("value of i j k is {} {} {}".format(i,j,k))
                if(pd.isnull(Packed_trucks.loc["item_" + str(j) + "L"])):
                    display(Packed_trucks["item_" + str(j) + "L"])
#     return 0


get_df(Packed_trucks)

Hi Rohit, would would mind creating an example like [this](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) Thanks — anky, Mar 07 '19 at 17:53

anky · Accepted Answer · 2019-03-08T07:58:32.467

1

May be something like:

m=df.loc[:,df.filter(like='item').columns]
df['Item1']=m.filter(like='0').astype(float).prod(axis=1)
df['Item2']=m.filter(like='1').astype(float).prod(axis=1)

Output:

    index   Loadedwith   item_0L    item_0B item_0H  item_1L    item_1B   item_1H   Item1   Item2
        1            1       4.6        4.3     4.3      4.6         4.3      4.3   85.054  85.054

EDIT

df['Item1']=m.astype(str).filter(like='0').apply(lambda x: 'X'.join(x),axis=1)
df['Item2']=m.astype(str).filter(like='1').apply(lambda x: 'X'.join(x),axis=1)
print(df)

   index  Loadedwith  item_0L  item_0B  item_0H  item_1L  item_1B item_1H  \
0      1           1      4.6      4.3      4.3      4.6      4.3     4.3   

         Item1        Item2  
0  4.6X4.3X4.3  4.6X4.3X4.3

edited Mar 08 '19 at 07:58

answered Mar 07 '19 at 18:38

anky

74,114
11
41
70

I am sorry, but I did not want the float product of the values, but the string values in a form like 4.6x4.3x4.3 with the letter x in between. I have edited the question, apologies for the time wasted because of unclear question. – Rohit Kumar Mar 08 '19 at 07:42
1

@RohitKumar how about the section under EDIT? – anky Mar 08 '19 at 07:58
Thanks for the reply, is there any way we can make the part where you say `df['Item2']` in a dynamic way like df[Item + str(i)] or something ? – Rohit Kumar Mar 08 '19 at 10:57
1

i think you can but again, since the filters are changing, for loops may be involved. if possible for you, you can try posting a fresh question for this. :) – anky Mar 08 '19 at 10:59
Haha @anky_91 not sure if I should post a question for this, I actually did create the function for it here [colab](https://colab.research.google.com/drive/10uMny-Y6vpwp0nC08mIZ0W-cDFtSs123#scrollTo=MYzHpGi1WAYk) Thing is, I don't know how to count the number of items. Can you tell me a way how to do that ? – Rohit Kumar Mar 08 '19 at 11:05
1

@RohitKumar unforunately cannot access the link, i use a corporate laptop mat be that's why – anky Mar 08 '19 at 11:06
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/189669/discussion-between-rohit-kumar-and-anky-91). – Rohit Kumar Mar 08 '19 at 11:08

score 0 · Answer 2 · answered Mar 07 '19 at 18:04

I'm a bit confused so I apologize if this is general, but it seems like you either need to parse the data or iterate through it. I'd recommend something along the lines of these:

Parse Line

f = open(file, "r")
line = f.readline()
data = []

while len(line) != 0:
    data.append(line.strip(","))
    //other code and stuff
    line = f.readline()

f.close()

This will open a file and will read the data and form a list of lists based off of the data. In doing this it becomes very easy to iterate through the list, making a segway into iterating.

Iterating

Should you need to iterate through a list of you values, a for loop is the easiest way. If you need to quickly get the entire row or column, though, I'd recommend

data = [your data]
row = data[0][:]
column = data[:][0]

just replace the 0 with whatever index you need. NOTE: This will only work with two-dimensional lists, which is why I recommend parsing as I previously showed.

Edit: You can find more examples with this by looking into list comprehension and list splicing

Hm...thanks for the new direction, though can you guide me through where I am might be going wrong with mine? — Rohit Kumar, Mar 07 '19 at 18:09
@RohitKumar there are a few optimizations I'd make. First, try `pandas.read_csv` first to get your data and then take a look at `pandas.loc[]` next. [Pandas Doc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) — , Mar 07 '19 at 18:20

Dennis Lyubyvy · Answer 3 · 2019-03-08T15:13:17.970

This solution will leverage pd.melt function and create a table where each line would be a combination of a Truck (Index) and an Item Number

df = pd.read_csv('df.csv')

# We will operate on a subset of columns, leaving just index and columns we need
truck_level_df = df.drop(['Name', 'TruckID', 'Length', 'Breadth', 
'Height', 'Volume', 'Weight', 'Price', 'Quantity', 'Loaded with'], 
axis = 1)

truck_level_df:

       index  item_0L  item_0B  item_0H  item_1L  item_1B  item_1H
0      1      4.6      4.3      4.3      4.6      4.3      4.3

# Create table with all the items and their measures
item_measure_level_df = truck_level_df.melt(id_vars = 'index', 
var_name = 'item_id_and_measure', value_name = 'item_val')

# Remove unneeded substring
item_measure_level_df['item_id_and_measure'] = 
item_measure_level_df['item_id_and_measure'].str.replace('item_', '')

# Extract Item ID
item_measure_level_df['item_id'] = 
item_measure_level_df['item_id_and_measure']
.str.replace(r'[A-Z]*', '', case = False)

# Create df where each line is a combination 
# of a Truck and an item
item_level_df = item_measure_level_df[['index', 'item_id']].drop_duplicates()

item_level_df:

       index item_id_and_measure  item_val item_id
0      1                  0L       4.6       0
1      1                  0B       4.3       0
2      1                  0H       4.3       0
3      1                  1L       4.6       1
4      1                  1B       4.3       1
5      1                  1H       4.3       1

Final step:

item_measure_level_df['item_val'] = item_measure_level_df['item_val'].astype('str')

# Group by Item and get LxHxB string
item_level_df['volume_string'] = item_measure_level_df.sort_values(by = ['index','item_id_and_measure']).groupby(['index','item_id'])['item_val'].apply(lambda x: ' x '.join(x)).values

The output:

  index item_id    volume_string
0     1       0  4.3 x 4.3 x 4.6
3     1       1  4.3 x 4.3 x 4.6

This solution will digest as many groups of columns as you would have

Shared notebook: https://colab.research.google.com/drive/16xUCMCH7rhOOp9Jwlv2RISnnmpzK-06d#scrollTo=lRDVe6B40VsH

I can't seem to run your code, there is an error _replace() got an unexpected keyword argument 'regex'_ . Please use this [colab](https://colab.research.google.com/drive/10uMny-Y6vpwp0nC08mIZ0W-cDFtSs123#scrollTo=sSaQXripNNFC) to try your own code. — Rohit Kumar, Mar 08 '19 at 07:39
colab is using pandas 0.22.0 for some reason, regex was introduced in 0.23 — Dennis Lyubyvy, Mar 08 '19 at 14:54
This is working (I have removed regex parameter and created a regex pattern instead): https://colab.research.google.com/drive/16xUCMCH7rhOOp9Jwlv2RISnnmpzK-06d — Dennis Lyubyvy, Mar 08 '19 at 14:55
Also I have posted solution that gives you a string like this: 4.3 x 4.3 x 4.6 — Dennis Lyubyvy, Mar 08 '19 at 15:10

How to get values from dataframe with dynamic columns

3 Answers3

Parse Line

Iterating