0

I have a dataframe that looks like this.

ac_val | node1 | node2 | node3 | 
--------------------------------
val1   |       |       |       | 
val2   |       |       |       | 
val3   |       |       |       | 

from the table, I need to open up text file of each node (node1, node2, node3) and look for the ac_val (val1, val2, val3) and fill in the value into cell respectively, I would like to know if there is any efficient way to fill in the "blanks" instead of using good old iteration method.

Edit: The content of the text file looks like this (I've removed most of the codes and retain the needed one)

[node1]: xxx val 1 24
[node1]: xxx val 2 55
[node1]: xxx val 3 -44

Then the result will look like this (same goes to node2, and node 3)

ac_val | node1 | node2 | node3 | 
--------------------------------
val1   |  24   |       |       | 
val2   |  55   |       |       | 
val3   | -44   |       |       | 

I don't need any actual codes or whatsoever to fill the blank for me, I just wanted to know if there is any better way to solve this than using iterrows()

ReverseEngineer
  • 529
  • 1
  • 5
  • 18
  • I just used the iterrow() method to loop through the rows and fill in one by one, if the dataframe size is small that shouldnt be any issue, but if I have a dataframe that is huge (100 thousands of data) then the slow is very significant – ReverseEngineer Jul 02 '21 at 06:02
  • What is the result after reading a txt file? Can you provide some sample of the data? – IoaTzimas Jul 02 '21 at 06:08
  • Does this [post](https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas/55557758#55557758) answer your question? – 99_m4n Jul 02 '21 at 06:13
  • Does this answer your question? [How to iterate over rows in a DataFrame in Pandas](https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas) – 99_m4n Jul 02 '21 at 06:16
  • I'n not sure I understand what you are trying to do, but it sounds a bit what would be called a "join" if we were talking databases, in which case pulling that text file into another dataframe (assuming it is small enough to fit in memory) and using DataFrame.join() might be the way to go. – Ture Pålsson Jul 02 '21 at 06:34

1 Answers1

0

Is it what you expect:

>>> %cat nodes.txt
[node1]: xxx val 1 24
[node1]: xxx val 2 55
[node1]: xxx val 3 -44
[node2]: xxx val 1 29
[node2]: xxx val 3 13
[node3]: xxx val 3 -98
[node3]: xxx val 5 63
import re

data = [re.match(r'\[(?P<nodes>\w+)\].*(?P<ac_val>val \d+) (?P<value>-?\d+)', s)
          .groupdict() for s in open('nodes.txt')]

df = pd.DataFrame(data).astype({'value': int}) \
                       .pivot_table(index='ac_val', columns='nodes', values='value')
>>> data
[{'nodes': 'node1', 'ac_val': 'val 1', 'value': '24'},
 {'nodes': 'node1', 'ac_val': 'val 2', 'value': '55'},
 {'nodes': 'node1', 'ac_val': 'val 3', 'value': '-44'},
 {'nodes': 'node2', 'ac_val': 'val 1', 'value': '29'},
 {'nodes': 'node2', 'ac_val': 'val 3', 'value': '13'},
 {'nodes': 'node3', 'ac_val': 'val 3', 'value': '-98'},
 {'nodes': 'node3', 'ac_val': 'val 5', 'value': '63'}]

>>> df
nodes   node1  node2  node3
ac_val
val 1    24.0   29.0    NaN
val 2    55.0    NaN    NaN
val 3   -44.0   13.0  -98.0
val 5     NaN    NaN   63.0
Corralien
  • 109,409
  • 8
  • 28
  • 52